Parsing addresses with ambiguous data

المشرف العام

Administrator
طاقم الإدارة
I have data of phone numbers and village names collected from the villagers via forms. Because of various reasons the data is inaccurate or incomplete.

The idea is to validate these two data points before adding them to the data base/store.

  1. The phone numbers are being formatted programmatically and validated via an external API. (That gives me the service provider and province information).
  2. The problem is with the addresses.
No standardized address line. Tons of ambiguity.

Numeric street names and door numbers exist.

Input string will sometimes contain an addressee.

Possible solutions I can think of

  • Reverse geocoding helps. But not very accurate when it comes to Indian context. The Google TOS also prohibits automated queries. (correct me if I'm wrong here)
  • Soundexing. Again not very accurate with Indian data.
I understand it's difficult to such highly unstructured data, but I'm looking for a ways to achieve atleast enough accuracy to map addresses to the nearest point of interest.

Queries

Given a village name from the villager who might spell it wrong or incorrectly or abbreviate it how do I get the correct official name of the village and location?

Any possible ways to sanitize bad location/addresses or decode complex/poorly formed addresses?

Are there any machine learning solutions that can help so I can learn from every computation?(I have 0 knowledge on ML, do correct me if I'm wrong here.)



أكثر...
 
أعلى