AI-powered research agent that pairs SEC Form D / Reg CF deals with Form ADV advisers using a local SQLite lakehouse of parsed filings plus analytics views that compute adviser–deal fit scores.
- Ingests quarterly Form D, Reg CF, and Form ADV datasets into normalized staging tables (
data/staging.sqlite). - Materializes feature-rich SQL views (
data/analytics_views.sql) that clean currency fields, derive ratios/buckets, and build adviser/deal matching candidates. - Scores every candidate on geography, capital sufficiency, investor audience alignment, ticket fit, traction, and security expertise, exposing both component and composite scores.
- Wraps Google Gemini 2.5 Flash in an Agno agent (
main.py) with strict planning + SQL guardrails so every response is backed by reproducible queries. - Emits markdown briefs (see
markdown/) summarizing top advisers for a deal, including RAUM, component scores, and rationale.
MatchingAgent/
├── data/
│ ├── staging_schema.sql # Table definitions for Form D, Reg CF, ADV
│ ├── analytics_views.sql # Derived views + scoring logic
│ ├── staging.sqlite # Populated SQLite db (after running loader)
│ └── *.md # Research notes (e.g., Reg CF vs Form D overview)
├── tools/
│ ├── load_staging.py # ETL script for raw TSV/CSV → SQLite
│ └── sequential_thinking_tool.py# Planning scratchpad tool for the agent
├── utils/prompts.py # System prompt describing workflow & guardrails
├── main.py # Entry point that runs the agent and saves markdown output
├── markdown/ # Generated agent reports
└── README.md # This file
- Raw files: place quarterly feeds under
~/Downloads/data/following the expected folder names (e.g.,2025Q1_d,2025Q1_cf,adv-filing-data-20111105-20241231-part1). - Run loader:
python tools/load_staging.py- Drops any existing staging tables defined in
data/staging_schema.sql. - Normalizes booleans/dates, parses currency strings, and writes each TSV/CSV into SQLite.
- Drops any existing staging tables defined in
- Materialize views:
sqlite3 data/staging.sqlite < data/analytics_views.sql- Creates the latest-submission, feature, candidate, and scoring views the agent relies on.
Detailed ETL notes live in markdown/project_overview.md, while markdown/view_scoring_details.md documents every view and score formula.
vw_fd_features/vw_cf_featuresconvert issuer economics into clean numerics plus buckets.vw_adv_featureskeeps each adviser’s latest RAUM, client mix, affiliations, and registered states.vw_investor_deal_candidatesjoins deals to advisers when geography/licensure align and emits binary fit flags.vw_investor_deal_scoredlayers continuous component scores and a weighted composite (25% geography, 25% capital, 20% audience, 15% ticket, 10% traction, 5% security).- See
markdown/view_scoring_details.mdfor the full breakdown.
- System prompt (
utils/prompts.py): enforces plan-first tool usage, schema inspection, SQL-only answers, and markdown outputs containing identifiers, geography, RAUM, component scores, and contact info. - Tools: the Agno agent loads two tools—
SequentialThinkingTools(custom planner) andSQLTools(runs queries againstdata/staging.sqlite). - Run: edit
USER_INPUTinmain.pyor wrap the agent in your own CLI/web interface; when executed, it stores responses undermarkdown/output_<timestamp>.md.
- Install dependencies (example):
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # create one with agno, pandas, sqlite-utils, etc.
- Place raw SEC data in
~/Downloads/data/as described above. - Run
python tools/load_staging.pyto builddata/staging.sqlite. - Apply analytics views:
sqlite3 data/staging.sqlite < data/analytics_views.sql. - Set the Gemini API key if not using the hardcoded placeholder (e.g., via
export GEMINI_API_KEY=...and updatemain.py). - Execute
python main.pyto generate a markdown advisor brief.
- “Find the top 5 advisers for Form D accession 0000005108-25-000002.”
- “Show deals that adviser FilingID 1620806 is a fit for.”
- “List Reg CF offerings that allow retail investors but lack Texas-registered advisers.”
Each request triggers: plan → schema check → SQL query (usually against vw_investor_deal_scored) → markdown summary & table.
- Adjust scoring thresholds/weights directly in
data/analytics_views.sql. - Add new data feeds or contact tables by extending
staging_schema.sql, updatingload_staging.py, and enriching the feature views. - Swap Gemini for another model supported by Agno by editing the
Agentconstructor inmain.py.
markdown/project_overview.md: narrative of the full pipeline and agent architecture.markdown/view_scoring_details.md: per-view transformations and score formulas.
For questions or enhancements, open an issue or reach out to the data/AI team maintaining this repository.