All of a sudden, I was in need to perform a point in polygon operation to find out in which polygon a bunch of point locations (both stored in a shapefile) could be found, if any. This is a well known problem and has well-known solutions (e.g. ArcGIS or QGIS). But this time I didn’t have the costly Arc at hand and for the problem I’m facing (around 60k points and +800 polygons) QGIS wouldn’t work in a reasonable time span. Plus, it would be nice to be able to leave a code trace in the form of a script that I could check 6 months down the road when I need to refresh my memory about what I did.
Luckily, PySAL has nice functionality to check whether a point is within a polygon or not, so all I had to do basically is to writie a wrapper function that takes the core functionality from PySAL and presents it to an end-user in an easy way for this particular task. It turns out I wrote not one function but two,
pip_shps_multi. The first one uses pandas and, based on my limited experiments, is a little faster on one core. For some reason, it is a bit harder given my limited expertise to parallelize it, so I wrote another one that uses the ‘rtree’ in PySAL and works on several cores. The API and functionality is the same, so you pick your flavour depending on whether you have one very fast core or many not that fast ones. The code is documented so it should be self-explaining, but just to give you a very easy example, here’s how you would typically find out whether any of the points in
pts.shp is within any of the polygons of
from pyGDsandbox.geo_tools import pip_shps, pip_shps_multi correspondences = pip_shps('pts.shp', 'polys.shp', polyID_col='polyIDcolumn') correspondences_multi = pip_shps_multi('pts.shp', 'polys.shp')
polyIDcolumn is the name of the column in the dbf for
polys.shp. If you leave that blank, it’ll give back only the position of the polygon (e.g. the 3rd one in the dbf). Please have a look at the code, play with it and have fun. Of course, any comment or suggestion is more than welcome.