MVP for Texas groundwater rights extraction#373
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (24.03%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #373 +/- ##
==========================================
- Coverage 56.23% 53.65% -2.58%
==========================================
Files 45 49 +4
Lines 4316 4695 +379
Branches 395 416 +21
==========================================
+ Hits 2427 2519 +92
- Misses 1860 2148 +288
+ Partials 29 28 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces an MVP for Texas groundwater rights extraction and begins generalizing the COMPASS framework. The main changes port groundwater rights extraction logic from ELM into COMPASS while adding extensibility through hook/callback mechanisms to support different extraction workflows beyond the original wind/solar ordinances.
Changes:
- Generalizes framework terminology (city → subdivision, LegalTextValidator → TextKindValidator) for broader applicability
- Adds hook/callback system to TechSpec for customizing document processing and data extraction workflows
- Implements water rights extraction module with RAG-based parsing and Texas water district jurisdictions
- Updates date handling patterns to use more robust unpacking syntax
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Bumps nlr-elm dependency to 0.0.36 |
compass/data/tx_water_districts.csv |
Adds 98 Texas water conservation districts as new jurisdiction types |
compass/utilities/nt.py |
Extends TechSpec with 5 optional hooks for custom processing workflows |
compass/utilities/__init__.py |
Adds cost entries for new LLM models (egswaterord-gpt4.1-mini, text-embedding-ada-002) |
compass/utilities/enums.py |
Adds EMBEDDING task enum for text embedding operations |
compass/utilities/jurisdictions.py |
Changes jurisdiction loading to support multiple CSV files via registry |
compass/utilities/parsing.py |
Refactors date extraction to use robust unpacking pattern |
compass/validation/content.py |
Generalizes LegalTextValidator into abstract TextKindValidator base class |
compass/validation/graphs.py |
Renames city-specific nodes/prompts to subdivision for broader terminology |
compass/extraction/apply.py |
Inverts check_if_legal_doc logic (was is_legal_doc) for clarity |
compass/extraction/water/__init__.py |
Exports water rights extraction classes and configuration |
compass/extraction/water/ordinance.py |
Implements water rights text collectors and extractors |
compass/extraction/water/parse.py |
Implements structured water parser using decision trees and RAG |
compass/extraction/water/graphs.py |
Defines 16 decision tree graphs for water rights feature extraction |
compass/extraction/water/processing.py |
Implements corpus building, extraction, and data writing hooks |
compass/scripts/process.py |
Integrates water extraction workflow and new hook system |
compass/scripts/download.py |
Refactors content filtering, makes permitted_use_text_collector optional |
compass/services/threaded.py |
Updates date handling to match new pattern |
tests/python/unit/validation/test_validation_graphs.py |
Updates tests for city → subdivision renaming |
tests/python/unit/validation/test_validation_content.py |
Updates tests for validator generalization |
@spodgorny9 implemented the initial groundwater rights extraction in ELM, and this is the MVP port into COMPASS. It's still rough around the edges, but Slater and I will discuss next steps with this code and massage it into a good state.
I am pushing this PR though because it also includes the beginnings of the COMPASS framework generalization that I am eager to implement ASAP.