Background.
I have several annual datasets which originated from a government planning agency. They were supplied in MapInfo TAB format, and are (notionally) all GDA94 (EPSG:4283), même si j'en doute.
It's my normal work practice to ogr2ogr the TAB files into a PostgreSQL/PostGIS database and do my GIS-ish misbehavings to the data in a proper analytical environment.
When I query the files using ogrinfo -so -al, it returns a kind of scrambled OGC-WKT... e.g.,
Layer SRS WKT:GEOGCS["unnamed", DATUM["GDA94", SPHEROID["GRS 80",6378137,298.257222101], TOWGS84[0,0,0,0,0,0,0]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433]]Close enough for government work, I guess: it contains all the relevant bits and they are correct.
However when I use the data - in PostGIS, QGIS or anywhere else, polygons that ought to overlap (either exactly or to a very large extent) do not. And by 'do not' I mean 'not in a way that's consistent with a slightly-wrong SRS'.
Below are examples two examples - in both examples
Example 2 is from the same locality, roughly 1.4km to the north-west. In this case the vertices only differ by 1.4m and 0.4m (note that not even the ratio of the drift extents are the same):
I should point out, too, that some lots in this dataset are shifted to the southeast, and also to the northwest).
Problem Statement
The aim is to move the lot from 2009 so that it is situated more precisely relative to its corresponding 2015 lot/s (there will be multiple 2015 lots when a lot has been subdivided).
The lots have persistent aspatial identifiers, but these change when a parcel is subdivided (as in the second example).
The second example is artificially easy, since the large 2009 parcel has large proportional ST_Intersection()s with the 2015 ones, and for all 2015 lots the 'correct' 2009 lot is also the lot with the largest ST_Intersection().
Now look back at example 1: there has been no subdivision, and for some of the 2015 parcels the 2009 lot with the largest intersection is not the right one. For example, the red-outlined lot: the 2009 lot with max(ST_Intersection()) is the wrong 2009 lot (the correct 2009 lot is highlighted in grey).
Where the aspatial identifiers persist across time, it's simple enough to snap the 2009 lots to their 2015 counterparts; where the 2015 and 2009 lots intersect so that the correct lot is the max intersection, that's also straightforward.
But when (1) the old lot has been subdivided and the shift makes the 2009 not the max-intersection lot... that's the nub of the problem, and the thing I would like to solve.
I can 'cascade' the two processes -
It would be ideal if the mechanism for correction was able to be automated, since this problem will arise for 79 different regions each year for 11 years, for a total of roughly 40 million individual lots-by-year that might have shifted relative to themselves, their parent or their children - and only a subset of those will be resolved by cascading the aspatial and spatial comparisons.
Environment
I also have MapInfo 'Professional' [ha!] 12.5 and 15 (in case the optimal solution calls for an expensive, low-productivity, proprietary, one-platform environment).
I almost got to the end without snarking on the Windows 8 of GIS software...
أكثر...
I have several annual datasets which originated from a government planning agency. They were supplied in MapInfo TAB format, and are (notionally) all GDA94 (EPSG:4283), même si j'en doute.
It's my normal work practice to ogr2ogr the TAB files into a PostgreSQL/PostGIS database and do my GIS-ish misbehavings to the data in a proper analytical environment.
When I query the files using ogrinfo -so -al, it returns a kind of scrambled OGC-WKT... e.g.,
Layer SRS WKT:GEOGCS["unnamed", DATUM["GDA94", SPHEROID["GRS 80",6378137,298.257222101], TOWGS84[0,0,0,0,0,0,0]], PRIMEM["Greenwich",0], UNIT["degree",0.0174532925199433]]Close enough for government work, I guess: it contains all the relevant bits and they are correct.
However when I use the data - in PostGIS, QGIS or anywhere else, polygons that ought to overlap (either exactly or to a very large extent) do not. And by 'do not' I mean 'not in a way that's consistent with a slightly-wrong SRS'.
Below are examples two examples - in both examples
- the grey cadastre is the 2015 version (from a different source,verified and with a complete EPSG SRS-WKT);
- the maroon is the 'problem' data (from 2009);
- the white dotted boundaries are the boundaries for the 2009 cadastre.

Example 2 is from the same locality, roughly 1.4km to the north-west. In this case the vertices only differ by 1.4m and 0.4m (note that not even the ratio of the drift extents are the same):

I should point out, too, that some lots in this dataset are shifted to the southeast, and also to the northwest).
Problem Statement
The aim is to move the lot from 2009 so that it is situated more precisely relative to its corresponding 2015 lot/s (there will be multiple 2015 lots when a lot has been subdivided).
The lots have persistent aspatial identifiers, but these change when a parcel is subdivided (as in the second example).
The second example is artificially easy, since the large 2009 parcel has large proportional ST_Intersection()s with the 2015 ones, and for all 2015 lots the 'correct' 2009 lot is also the lot with the largest ST_Intersection().
Now look back at example 1: there has been no subdivision, and for some of the 2015 parcels the 2009 lot with the largest intersection is not the right one. For example, the red-outlined lot: the 2009 lot with max(ST_Intersection()) is the wrong 2009 lot (the correct 2009 lot is highlighted in grey).
Where the aspatial identifiers persist across time, it's simple enough to snap the 2009 lots to their 2015 counterparts; where the 2015 and 2009 lots intersect so that the correct lot is the max intersection, that's also straightforward.
But when (1) the old lot has been subdivided and the shift makes the 2009 not the max-intersection lot... that's the nub of the problem, and the thing I would like to solve.
I can 'cascade' the two processes -
- find all things where the identifier didn't change, then - when a lot is not matched at 1 -
- find all max-intersection lots where the area, perimeter, roundness are the same and the bilateral intersection after snapping is close to 100%
It would be ideal if the mechanism for correction was able to be automated, since this problem will arise for 79 different regions each year for 11 years, for a total of roughly 40 million individual lots-by-year that might have shifted relative to themselves, their parent or their children - and only a subset of those will be resolved by cascading the aspatial and spatial comparisons.
Environment
- Python 3.4 and 2.7.10 (with GDAL bindings, bien sûr);
- GDAL 1.11;
- PostgreSQL 9.3 (and 9.4 if required) with PostGIS 2.1.7;
- QGIS 2.10 and 2.12
I also have MapInfo 'Professional' [ha!] 12.5 and 15 (in case the optimal solution calls for an expensive, low-productivity, proprietary, one-platform environment).
I almost got to the end without snarking on the Windows 8 of GIS software...
أكثر...