I am using the code below to find a country (and sometimes state) for millions of GPS points. The code currently takes about one second per point, which is incredibly slow. The shapefile is 6 MB.
I read that geopandas uses rtrees for spatial joins, making them incredibly efficient, but this does not seem to work here. What am I doing wrong? I was hoping for a thousand points per second or so.
The shapefile and csv can be downloaded here (5MB): https://www.dropbox.com/s/gdkxtpqupj0sidm/SpatialJoin.zip?dl=0
import pandas as pd import geopandas as gpd from geopandas import GeoDataFrame, read_file from geopandas.tools import sjoin from shapely.geometry import Point, mapping,shape import time #parameters shapefile="K:/.../Shapefiles/Used/World.shp" df=pd.read_csv("K:/.../output2.csv",index_col=None,nrows=20)# Limit to 20 rows for testing if __name__=="__main__": start=time.time() df['geometry'] = df.apply(lambda z: Point(z.Longitude, z.Latitude), axis=1) PointsGeodataframe = gpd.GeoDataFrame(df) PolygonsGeodataframe = gpd.GeoDataFrame.from_file(shapefile) PointsGeodataframe.crs = PolygonsGeodataframe.crs print time.time()-start merged=sjoin(PointsGeodataframe, PolygonsGeodataframe, how='left') print time.time()-start merged.to_csv("K:/01. Personal/04. Models/10. Location/output.csv",index=None) print time.time()-start
أكثر...
I read that geopandas uses rtrees for spatial joins, making them incredibly efficient, but this does not seem to work here. What am I doing wrong? I was hoping for a thousand points per second or so.
The shapefile and csv can be downloaded here (5MB): https://www.dropbox.com/s/gdkxtpqupj0sidm/SpatialJoin.zip?dl=0
import pandas as pd import geopandas as gpd from geopandas import GeoDataFrame, read_file from geopandas.tools import sjoin from shapely.geometry import Point, mapping,shape import time #parameters shapefile="K:/.../Shapefiles/Used/World.shp" df=pd.read_csv("K:/.../output2.csv",index_col=None,nrows=20)# Limit to 20 rows for testing if __name__=="__main__": start=time.time() df['geometry'] = df.apply(lambda z: Point(z.Longitude, z.Latitude), axis=1) PointsGeodataframe = gpd.GeoDataFrame(df) PolygonsGeodataframe = gpd.GeoDataFrame.from_file(shapefile) PointsGeodataframe.crs = PolygonsGeodataframe.crs print time.time()-start merged=sjoin(PointsGeodataframe, PolygonsGeodataframe, how='left') print time.time()-start merged.to_csv("K:/01. Personal/04. Models/10. Location/output.csv",index=None) print time.time()-start
أكثر...