I found the server with only 2 states data loaded is much faster than the server with all states loaded. My theory is bad formatted address that don't have a exact hit at first will cost much more time when the geocoder checked all states. With only 2 states this search is limited and stopped much early.
There is a restrict_region parameter in geocode function looks promising if it can limit the search range, since I have enough information or reason to believe the state information in my addresses input are correct. I wrote a query trying to use one state's geometry as the limiting parameter:
SELECT geocode('501 Fairmount DR , Annapolis, MD 20137', 1, the_geom) FROM tiger.state WHERE statefp = '24';and compared the performance with the simple version
SELECT geocode('501 Fairmount DR , Annapolis, MD 20137',1);I didn't find performance gain with the parameter. Instead it lost the performance gain from caching, which usually came from running same query immediately again because all the needed data have been cached in RAM.
Maybe my usage is not proper, or this parameter is not intended to work as I expected. However if the search range can be limited, the performance gain could be substantial, since it's the bad formatted addresses took the most time to geocode, and they also often mess up the already cached data because the geocoder need to search for states, even all my input are in one state and all data can be cached in RAM.
أكثر...
There is a restrict_region parameter in geocode function looks promising if it can limit the search range, since I have enough information or reason to believe the state information in my addresses input are correct. I wrote a query trying to use one state's geometry as the limiting parameter:
SELECT geocode('501 Fairmount DR , Annapolis, MD 20137', 1, the_geom) FROM tiger.state WHERE statefp = '24';and compared the performance with the simple version
SELECT geocode('501 Fairmount DR , Annapolis, MD 20137',1);I didn't find performance gain with the parameter. Instead it lost the performance gain from caching, which usually came from running same query immediately again because all the needed data have been cached in RAM.
Maybe my usage is not proper, or this parameter is not intended to work as I expected. However if the search range can be limited, the performance gain could be substantial, since it's the bad formatted addresses took the most time to geocode, and they also often mess up the already cached data because the geocoder need to search for states, even all my input are in one state and all data can be cached in RAM.
أكثر...