Description
Homework 4 – Spatial Analysis of Peru’s Protected Areas
1. Learning goals
- Practice querying vector data from OpenStreetMap (OSM) via the Overpass API.
- Download and work with MODIS Vegetation Continuous Fields (VCF, product MOD44B v061) for the year 2024.
- Perform buffer-based spatial joins to link raster pixels with protected-area boundaries.
- Compute and interpret summary statistics of tree-cover variables at multiple buffer widths.
- (Extra credit) Build a regression-discontinuity (RD) plot to explore causal impacts of legal protection on forest cover.
2. Required software & libraries
- Python ≥ 3.9
- Typical spatial stack
- Python:
osmnx
,pyrosm
,requests
,geopandas
,rasterio
,rioxarray
,xarray
,numpy
,pandas
,shapely
,matplotlib
,statsmodels
- Python:
Tip: Generate a new environment in conda for this project such that all the libraries are in the correct version
3. Tasks (detailed)
3.1 Acquire Peru’s protected-area polygons from OSM (25 pts) info
- Formulate an Overpass query that returns all polygons tagged with any of the following (logical OR):
boundary=protected_area
leisure=nature_reserve
protect_class=*
Limit to objects inside Peru (ISO 3166-1 alpha-2 codePE
).
- Fetch the data programmatically (e.g., with
requests
orosmnx.features_from_polygon
). - Filter to keep national / regional parks, wildlife reserves, and forest reserves.
Keep your filtering rule explicit in code (e.g., acceptprotect_class
∈ {“2”, “4”, …}). - Save the final boundaries to GeoPackage
peru_protected_areas.gpkg
.
3.2 Download VCF (MOD44B v061) for 2024 (25 pts) info
- Register for an Earthdata Login (if you haven’t already).
- Using
requests
orwget
, programmatically pull all tiles covering Peru for collection 6.1, subset “Percent Tree Cover” (science datasetPercent_Tree_Cover
), year = 2024. - Mosaic & reproject to EPSG:4326; name the raster
vcf_2024_peru.tif
.
Hint:rio merge
,gdalwarp -t_srs EPSG:4326
, orrioxarray.combine_by_coords
. - It has many layers. We want to work with Percent_Tree_Cover.
3.3 Generate buffer zones & extract raster data (50 pts)
For each protected-area polygon P and each bandwidth b ∈ {5, 10, 20, 25 km}:
-
Create two mutually exclusive geometries:
- Inner buffer:
b
km inside the boundary. - Outer buffer:
b
km out of the polygon since the boundary. - Use the boundary to generate these rings. We want to inspect forest around protected areas.
- Inner buffer:
-
Use Zonal stats to generate the statistics (mean, median, sd, n pixels) forest cover for the inner buffer and .
-
Store results in a tidy CSV:
park_id buffer_km in_out mean_tc median_tc sd_tc Percent_Tree_Cover -
Generate a simple binary regression with mean of Percent_Tree_Cover as outcome and D (in boundary==1) treatment variable. Weight the observations using the number of pixels.
3.5 (Extra credit, 100 pts) RD plot at pixel level
- Change the aggregation level of your data at pixel level. Keep all the pixels inside the inner and outer buffer areas using pixel centroids. We want this analysis only for b = 25 km.
- Find the minimum distance from the pixel centroid to boundary.
- Generate rd plots using distance as the score and forest value (Percent_Tree_Cover value of the raster). The score has positive values if the centroid is in the inner buffer, and negative otherwise.
- Discuss the results
4. 🌳 Paper Analysis with LLMs
Download the following paper titled Tree Cover and Carbon Dioxide Capture in Urban Parks: The Case of Northern Lima.
Objetive
Develop a reproducible workflow that allows you to:
- Upload and process documents (PDF).
- Extract information (text).
- Apply prompt engineering techniques to obtain insights using language models (LLMs), such as park efficiency rankings, optimal species recommendations, and district-level diagnostics.
- Assess their use for the development of public policies.
Required Tools
- Google Colab
- Model GPT: 3.5 (Important)
- Libraries:
-
PyMuPDF
(fitz
) -
API KEY GPT: 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
-
openia
-
5. Helpful references
-
Overpass API & OSM tags
- Beginner guide: https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_turbo/Quick_guide
- Example Python usage (
osmnx
): https://osmnx.readthedocs.io/en/stable/osmnx.html#osmnx.geometries.geometries_from_polygon
-
MODIS VCF (MOD44B v061, 250 m)
- Product page: https://lpdaac.usgs.gov/products/mod44bv061/
- Direct HTTPS directory (LAADS DAAC):
https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MOD44B/2024/
-
Raster & vector handling
- Rasterio tutorial: https://rasterio.readthedocs.io/en/latest/quickstart.html
- GeoPandas buffering: https://geopandas.org/en/stable/docs/user_guide/geometric_manipulations.html#buffer
- rioxarray for merging tiles: https://corteva.github.io/rioxarray/stable/examples/merge_tutorial.html
-
RD visualization
statsmodels.nonparametric.kernel_regression.KernelReg
(Python)rdrobust
package (R): https://rdpackages.github.io/