Weighted zonal statistics in Python and/or R

المشرف العام · 29 نوفمبر 2015

I have raster files with precipitation data (4x4 km), monthly, for about 100 years. I have a shapefile for a watershed (which includes several polygons as sub-basins).I need the average precipitation for the watershed for each month. I don't want to do this by hand in ArcMap because it will probably need to be repeated plenty of times (for other watersheds, and when more data becomes available), so I am coding it.

I have written code in R now, using packages raster and rgdal. It works great in the sense that it has a functionality to calculate the weighted average per sub-basin; it checks which fraction of a raster cell falls within the polygon and uses this to calculate a weighted average for the polygon. Because some of the sub-basins are small in comparison to the raster, this is a great feature.And in addition, because I can access the shapefile information such as the area of the sub-basins, I can calculate a weighted average for the whole watershed.

However - this is slow. It also says it in the description of the function extract(). Calculating the weighted value takes 4-5 seconds, so when I ran this script for 30 years worth of monthly data, it took 37 minutes (there are a few more calculations besides the weighted average).

I have a feeling it might be faster to use Python, but I am not sure how to approach this. I did get started with this before the R code (see below) but shifted my attention to R. It seemed a bit daunting since it's been a while that I coded in Python, and I've never used Python for geospatial analyses.

Also, an initial test seems to show that in Python, there is no weighted averaging happening, and that small polygons that do not contain the center of a raster cell, get assigned 0 or None.

So my questions are:

If it is possible in Python, will weighted averaging that way be faster?
How to access other data associated with the shapefile (such as area)?

Below is the very very basic code I have so far.

from rasterstats import zonal_stats import gdal basin="Steinhatchee_Project.shp" rain="PRISM_ppt_stable_4kmM3_201504_bil.bil" # NOTE From Python documentation: "There is no right or wrong way to rasterize a vector. # The default strategy is to include all pixels along the line render path (for lines), # or cells where the center point is within the polygon (for polygons) stats=zonal_stats(basin,rain) # get average for every polygon (there are 19) data=([f['mean'] for f in stats]) # take out the "None" values... data2=[x for x in data if x is not None] # ... so I can calculate the mean for the whole watershed mean(data2)data looks as follows. Note the None...

Out[76]: [115.64999389648438, 83.07333374023438, 92.33000183105469, 73.35416666666667, 97.36499786376953, 92.03500366210938, 80.30999755859375, 102.8699951171875, 94.3499984741211, 110.99250030517578, 80.18000030517578, 67.6749979654948, 107.825, 82.63999938964844, 88.375, 59.769999186197914, 63.48249816894531, None, None]My system specs: Windows 7, Python 2.7.9 (using the Enthought Canopy IDE) 32-bit, GDAL-1.11.3, rasterio-0.29.0, Shapely-1.5.13

أكثر...

Weighted zonal statistics in Python and/or R

المشرف العام

Administrator