I have data of phone numbers and village names collected from the villagers via forms. Because of various reasons the data is inaccurate or incomplete.
The idea is to validate these two data points before adding them to the data base/store.
Numeric street names and door numbers exist.
Input string will sometimes contain an addressee.
Possible solutions I can think of
Queries
Given a village name from the villager who might spell it wrong or incorrectly or abbreviate it how do I get the correct official name of the village and location?
Any possible ways to sanitize bad location/addresses or decode complex/poorly formed addresses?
Are there any machine learning solutions that can help so I can learn from every computation?(I have 0 knowledge on ML, do correct me if I'm wrong here.)
أكثر...
The idea is to validate these two data points before adding them to the data base/store.
- The phone numbers are being formatted programmatically and validated via an external API. (That gives me the service provider and province information).
- The problem is with the addresses.
Numeric street names and door numbers exist.
Input string will sometimes contain an addressee.
Possible solutions I can think of
- Reverse geocoding helps. But not very accurate when it comes to Indian context. The Google TOS also prohibits automated queries. (correct me if I'm wrong here)
- Soundexing. Again not very accurate with Indian data.
Queries
Given a village name from the villager who might spell it wrong or incorrectly or abbreviate it how do I get the correct official name of the village and location?
Any possible ways to sanitize bad location/addresses or decode complex/poorly formed addresses?
Are there any machine learning solutions that can help so I can learn from every computation?(I have 0 knowledge on ML, do correct me if I'm wrong here.)
أكثر...