This involves a lot of code; if it's better suited for StackOverflow, please let me know!
I'm working on a project involving trips that are constructed from GPS coordinates. I'm mapping the recorded trips on a webpage, with version 0.35 of Windshaft. Trips are collected from a mobile application, and the GPS data is inserted into a Postgres (9.3) database (with PostGIS (2.1.2) installed). The relevant tables are:
coord_geog: Column | Type |---------+-----------------------------|id | integer | trip_id | integer |recorded | timestamp without time zone |geog | geography(Point,4326) |geom | geometry(Point,4326) |next | integer |trip_geom: Column | Type |---------+-----------------------------|id | integer |purpose | character varying(255) |start | timestamp without time zone |stop | timestamp without time zone |geom | geometry(LineString,4326) |The trip_id column of coord_geog corresponds to the id column of trip_geom. The application supplies data for the id, trip_id, recorded, and geog columns of coord_geom, and the id, purpose, start, and stop columns of trip_geom. I'm actually not sure what the geom column of coord_geog is used for, but I'm too scared to remove it (and it doesn't appear to be used in anything mentioned in this question).
I'm not very sure how the whole Windshaft thing works, but if it helps, the Windshaft configuration is currently set as:
req.params.sql = "(select * from trip_geom_frag where purpose ilike '" + req.params.purpose + "%') as trip_geom_frag";req.params = _.extend({}, req.params, {style: style});I'll specify trip_geom_frag lower down - it is similar to trip_geom.
The specific task I'm trying to improve is constructing a LineString path from all of the points in a path.
I inherited a Python script that's supposed to handle this task. Although I'm not particularly well-versed in Python or SQL, I recognized that a lot of things that the script was doing could be done as pure SQL, which resulted in a major speedup at first. After the points are anonymized (trimmed from each end of a trip), a query like this constructs the LineString:
UPDATE trip_geom t SET geom=(SELECT ST_MakeLine(line.geom) FROM (SELECT c.recorded, c.geom FROM coord_geog c WHERE c.trip_id=t.id ORDER BY c.recorded ASC) as line) WHERE t.id=%s;where the %s parameter is an id field from a row in trip_geom. This constructs a LineString connecting the points in the trip. However, there tend to be errors in the GPS data, where a bad GPS fix causes one or more points to be incorrectly located far away, or the recording is paused and later resumed, resulting in a large 'jump' in the trip.
Valid points are generally close together (since the trips are bicycle trips at a relatively low speed), so when a trip has all valid GPS data, the LineString is a fairly smooth plot (map background by CartoDB):
However, many trips (especially the long ones) have jumps or errors, resulting in trips that look like this:
While I'm sure there are some highly accomplished bicyclists, I doubt that someone was able to bike straight across a river; the straight line appears to be caused by the bicyclist pausing and resuming a trip recording. These errors aren't a big deal when isolated, but when there are many of them, the map of trips becomes very cluttered with straight lines that cross over large areas of the map.
Therefore, the specific task I am attempting to accomplish is to generate LineString paths without adjacent points that are very far apart. I imagine that this is a task well suited to PostGIS (or SQL in general), but I'm having difficulty finding a solution.
My first attempted solution was to remove points that were far from the previous point. Since I don't have the exact SQL any more, here is the procedure in pseudo-code:
for t in trips: i = 1 while i < t.points.length: current_pt = t.points last_pt = t.points[i-1] if ! ST_DWithin( current_pt, last_pt, 100 ): delete_point( current_pt ) else: i = i + 1This resulted in smooth trips, but the trips with jumps in the GPS data were truncated after the jump. For example, in a trip with a jump between the trip's first and second points, all of the points in the trip after the first point would be removed.
My second (and current) solution is to create a second table like trip_geom to hold LineStrings from trips, which are separated whenever there's a jump between points. I decided to call these fragmented paths - there's probably a better term. Here's the new table I created:
trip_geom_frag: Column | Type |---------+-----------------------------|id | integer |geom | geometry(LineString,4326) |purpose | character varying(255) |orig_trip| integer |Then, using the next column of coord_geog (which I added), I use a query like this to construct a sort of linked list of the points in each trip, so that a -1 value represents either a jump in the points or the end of a trip:
UPDATE coord_geog current SET next = COALESCE((SELECT id FROM coord_geog WHERE trip_id=current.trip_id AND recorded>current.recorded AND ST_DWithin(current.geog, geog, 100) ORDER BY id ASC LIMIT 1), -1);Then, in Python, I insert a row into trip_geom_frag for each segment of a trip:
for t in trips: trip_id = t.id c.execute('SELECT id, next FROM coord_geog WHERE trip_id=%s ORDER BY id ASC;', (trip_id,)); coords = c.fetchall() ind = [coord[1] for coord in coords] while len(coords) > 0: i = ind.index(-1) c.execute('INSERT INTO trip_geom_frag (geom, orig_trip, purpose) SELECT ST_MakeLine(line.geom), %s, %s FROM (SELECT c.geom FROM coord_geog c WHERE c.trip_id=%s AND c.id >= %s AND c.id
I'm working on a project involving trips that are constructed from GPS coordinates. I'm mapping the recorded trips on a webpage, with version 0.35 of Windshaft. Trips are collected from a mobile application, and the GPS data is inserted into a Postgres (9.3) database (with PostGIS (2.1.2) installed). The relevant tables are:
coord_geog: Column | Type |---------+-----------------------------|id | integer | trip_id | integer |recorded | timestamp without time zone |geog | geography(Point,4326) |geom | geometry(Point,4326) |next | integer |trip_geom: Column | Type |---------+-----------------------------|id | integer |purpose | character varying(255) |start | timestamp without time zone |stop | timestamp without time zone |geom | geometry(LineString,4326) |The trip_id column of coord_geog corresponds to the id column of trip_geom. The application supplies data for the id, trip_id, recorded, and geog columns of coord_geom, and the id, purpose, start, and stop columns of trip_geom. I'm actually not sure what the geom column of coord_geog is used for, but I'm too scared to remove it (and it doesn't appear to be used in anything mentioned in this question).
I'm not very sure how the whole Windshaft thing works, but if it helps, the Windshaft configuration is currently set as:
req.params.sql = "(select * from trip_geom_frag where purpose ilike '" + req.params.purpose + "%') as trip_geom_frag";req.params = _.extend({}, req.params, {style: style});I'll specify trip_geom_frag lower down - it is similar to trip_geom.
The specific task I'm trying to improve is constructing a LineString path from all of the points in a path.
I inherited a Python script that's supposed to handle this task. Although I'm not particularly well-versed in Python or SQL, I recognized that a lot of things that the script was doing could be done as pure SQL, which resulted in a major speedup at first. After the points are anonymized (trimmed from each end of a trip), a query like this constructs the LineString:
UPDATE trip_geom t SET geom=(SELECT ST_MakeLine(line.geom) FROM (SELECT c.recorded, c.geom FROM coord_geog c WHERE c.trip_id=t.id ORDER BY c.recorded ASC) as line) WHERE t.id=%s;where the %s parameter is an id field from a row in trip_geom. This constructs a LineString connecting the points in the trip. However, there tend to be errors in the GPS data, where a bad GPS fix causes one or more points to be incorrectly located far away, or the recording is paused and later resumed, resulting in a large 'jump' in the trip.
Valid points are generally close together (since the trips are bicycle trips at a relatively low speed), so when a trip has all valid GPS data, the LineString is a fairly smooth plot (map background by CartoDB):

However, many trips (especially the long ones) have jumps or errors, resulting in trips that look like this:

While I'm sure there are some highly accomplished bicyclists, I doubt that someone was able to bike straight across a river; the straight line appears to be caused by the bicyclist pausing and resuming a trip recording. These errors aren't a big deal when isolated, but when there are many of them, the map of trips becomes very cluttered with straight lines that cross over large areas of the map.
Therefore, the specific task I am attempting to accomplish is to generate LineString paths without adjacent points that are very far apart. I imagine that this is a task well suited to PostGIS (or SQL in general), but I'm having difficulty finding a solution.
My first attempted solution was to remove points that were far from the previous point. Since I don't have the exact SQL any more, here is the procedure in pseudo-code:
for t in trips: i = 1 while i < t.points.length: current_pt = t.points last_pt = t.points[i-1] if ! ST_DWithin( current_pt, last_pt, 100 ): delete_point( current_pt ) else: i = i + 1This resulted in smooth trips, but the trips with jumps in the GPS data were truncated after the jump. For example, in a trip with a jump between the trip's first and second points, all of the points in the trip after the first point would be removed.
My second (and current) solution is to create a second table like trip_geom to hold LineStrings from trips, which are separated whenever there's a jump between points. I decided to call these fragmented paths - there's probably a better term. Here's the new table I created:
trip_geom_frag: Column | Type |---------+-----------------------------|id | integer |geom | geometry(LineString,4326) |purpose | character varying(255) |orig_trip| integer |Then, using the next column of coord_geog (which I added), I use a query like this to construct a sort of linked list of the points in each trip, so that a -1 value represents either a jump in the points or the end of a trip:
UPDATE coord_geog current SET next = COALESCE((SELECT id FROM coord_geog WHERE trip_id=current.trip_id AND recorded>current.recorded AND ST_DWithin(current.geog, geog, 100) ORDER BY id ASC LIMIT 1), -1);Then, in Python, I insert a row into trip_geom_frag for each segment of a trip:
for t in trips: trip_id = t.id c.execute('SELECT id, next FROM coord_geog WHERE trip_id=%s ORDER BY id ASC;', (trip_id,)); coords = c.fetchall() ind = [coord[1] for coord in coords] while len(coords) > 0: i = ind.index(-1) c.execute('INSERT INTO trip_geom_frag (geom, orig_trip, purpose) SELECT ST_MakeLine(line.geom), %s, %s FROM (SELECT c.geom FROM coord_geog c WHERE c.trip_id=%s AND c.id >= %s AND c.id