Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions ai-data-integration/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
---
name: ai-data-integration
description: "Use this skill when connecting AI or LLMs to data platforms. Covers MCP servers for warehouses, natural-language-to-SQL, embeddings for data discovery, LLM-powered enrichment, and AI agent data access patterns. Common phrases: \"text-to-SQL\", \"MCP server for Snowflake\", \"LLM data enrichment\", \"AI agent access\". Do NOT use for general data integration (use data-integration) or dbt modeling (use dbt-transforms)."
model_tier: reasoning
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
conditions:
- when: "designing novel security tier taxonomy from scratch"
hold_at: opus
version: 1.0.0
---

Expand All @@ -28,7 +36,9 @@ Expert guidance for integrating AI/LLM capabilities with data engineering system

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| high | Opus | Sonnet | Sonnet |
| medium | Sonnet | Sonnet, Opus | Sonnet |

Condition: designing novel security tier taxonomy from scratch → hold at Opus.

## Core Principles

Expand Down
9 changes: 7 additions & 2 deletions client-delivery/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: client-delivery
description: "Use this skill when managing a consulting data cleaning engagement. Covers engagement setup, schema profiling, security tier selection, project scaffolding, deliverable generation, and client handoff. Common phrases: \"set up a cleaning project\", \"profile this schema\", \"data cleaning engagement\", \"generate deliverables\", \"client handoff\". Do NOT use for writing dbt models (use dbt-transforms), DuckDB queries (use duckdb), or pipeline orchestration (use data-pipelines)."
model_tier: analytical
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand Down Expand Up @@ -33,7 +38,7 @@ Guides data cleaning engagements from discovery through client handoff. Microsof

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| medium | Sonnet | Opus, Haiku | Haiku |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down
9 changes: 7 additions & 2 deletions data-integration/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: data-integration
description: "Use this skill when designing data integrations or connecting systems. Covers iPaaS platforms (Workato, MuleSoft, Boomi), dlt pipelines, API patterns, CDC, webhooks, and Reverse ETL. Common phrases: \"connect these systems\", \"build a dlt pipeline\", \"event-driven architecture\", \"change data capture\". Do NOT use for stream processing frameworks (use event-streaming) or pipeline scheduling (use data-pipelines)."
model_tier: analytical
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand All @@ -27,7 +32,7 @@ This skill covers enterprise data integration patterns. It does NOT cover: basic

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| medium | Sonnet | Opus, Haiku | Haiku |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down
9 changes: 7 additions & 2 deletions data-pipelines/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: data-pipelines
description: "Use this skill when scheduling, orchestrating, or monitoring data pipelines. Covers Dagster assets, Airflow DAGs, Prefect flows, sensors, retries, alerting, and cross-tool integrations (dagster-dbt, dagster-dlt). Common phrases: \"schedule this pipeline\", \"Dagster vs Airflow\", \"add retry logic\", \"pipeline alerting\", \"consulting pipeline\". Do NOT use for building transformations (use dbt-transforms or python-data-engineering) or designing integration patterns (use data-integration)."
model_tier: analytical
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand Down Expand Up @@ -31,7 +36,7 @@ Expert guidance for orchestrating data pipelines. Dagster-first for greenfield p

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| medium | Sonnet | Opus, Haiku | Haiku |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down
9 changes: 7 additions & 2 deletions dbt-transforms/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: dbt-transforms
description: "Use this skill when building or reviewing dbt models, tests, or project structure. Triggers on analytics engineering tasks including staging/marts layers, materializations, incremental strategies, Jinja macros, sources, warehouse configuration, DuckDB adapter, data cleaning, and deduplication patterns. Common phrases: \"dbt model\", \"write a dbt test\", \"incremental strategy\", \"semantic layer\", \"dbt DuckDB\", \"cleaning patterns\". Do NOT use for Python DataFrame code (use python-data-engineering), pipeline scheduling (use data-pipelines), or standalone DuckDB queries without dbt (use duckdb)."
model_tier: analytical
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand All @@ -26,7 +31,7 @@ Comprehensive dbt guidance covering project structure, modeling, testing, CI/CD,

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| medium | Sonnet | Opus, Haiku | Haiku |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down
9 changes: 7 additions & 2 deletions dlt-extract/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: dlt-extract
description: "Use this skill when building DLT pipelines for file-based or consulting data extraction. Covers Excel/CSV/SharePoint ingestion via DLT, destination swapping (DuckDB dev to warehouse prod), schema contracts for cleaning, and portable pipeline patterns. Common phrases: \"dlt pipeline for files\", \"extract Excel with dlt\", \"portable data pipeline\", \"dlt filesystem source\". Do NOT use for core DLT concepts like REST API or SQL database sources (use data-integration) or pipeline scheduling (use data-pipelines)."
model_tier: analytical
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand Down Expand Up @@ -33,7 +38,7 @@ File-based extraction and consulting portability only. Hands off to data-integra

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| medium | Sonnet | Opus, Haiku | Haiku |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down
9 changes: 7 additions & 2 deletions duckdb/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: duckdb
description: "Use this skill when working with DuckDB for local data analysis, file ingestion, or data exploration. Covers reading CSV/Excel/Parquet/JSON files into DuckDB, SQL analytics on local data, data profiling, cleaning transformations, and export to various formats. Common phrases: \"analyze this CSV\", \"DuckDB query\", \"local data analysis\", \"read Excel in SQL\", \"profile this data\". Do NOT use for dbt model building (use dbt-transforms with DuckDB adapter) or cloud warehouse administration."
model_tier: analytical
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand All @@ -26,7 +31,7 @@ Local-first SQL analytics on files. Read, profile, clean, and export data withou

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| medium | Sonnet | Opus, Haiku | Haiku |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down
9 changes: 7 additions & 2 deletions event-streaming/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: event-streaming
description: "Use this skill when building real-time or near-real-time data pipelines. Covers Kafka, Flink, Spark Streaming, Snowpipe, BigQuery streaming, materialized views, and batch-vs-streaming decisions. Common phrases: \"real-time pipeline\", \"Kafka consumer\", \"streaming vs batch\", \"low latency ingestion\". Do NOT use for batch integration patterns (use data-integration) or pipeline orchestration (use data-pipelines)."
model_tier: reasoning
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand Down Expand Up @@ -37,7 +42,7 @@ Do NOT use for: batch ETL (use `dbt-transforms`), static data modeling, SQL opti

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| high | Opus | Sonnet | Sonnet |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down
1 change: 1 addition & 0 deletions pipeline/config/budgets.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
"overrides": {
"python-data-engineering/references/data-validation-patterns.md": {
"reference_max_words": 1130,
"reference_max_tokens": 1500,
"reason": "3% over target (1130w vs 1100w). Contains Pydantic, Pandera, and Great Expectations patterns \u2014 all three frameworks are essential for the skill's validation coverage. Trimming further would remove one framework entirely."
}
}
Expand Down
20 changes: 16 additions & 4 deletions pipeline/config/model-routing.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Model Routing Configuration
# See specs/SKILL-MODEL-ROUTING-SPEC.md for full specification.
spec_version: "1.3"
spec_version: "1.4"

# Default model preferences by skill classification
defaults:
Expand All @@ -19,9 +19,9 @@ defaults:

# Budget zone thresholds (percentage of max_simultaneous_tokens)
zones:
green: 0.70 # 0-70%: use preferred models
yellow: 0.90 # 70-90%: downgrade low/medium reasoning_demand
red: 1.00 # 90-100%: downgrade all to minimum tier
green: 0.70
yellow: 0.90
red: 1.00

# Task type defaults
task_types:
Expand All @@ -35,5 +35,17 @@ task_types:
preferred: opus
description: "Debugging, architecture, complex reasoning"

# Tier-to-model mapping
tiers:
haiku:
claude_code: claude-haiku-4-5-20251001
cost_ratio: 1
sonnet:
claude_code: claude-sonnet-4-6
cost_ratio: 5
opus:
claude_code: claude-opus-4-6
cost_ratio: 25

# Per-skill overrides (empty by default — suites populate as needed)
overrides: {}
2 changes: 1 addition & 1 deletion pipeline/scripts/budget-report.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def get_budget_limits(rel_path, classification, budgets):
word_key = classification + "_max_words"
token_key = classification + "_max_tokens"
if word_key in override:
return override[word_key], override[token_key]
return override[word_key], override.get(token_key, budgets.get(token_key))

word_key = classification + "_max_words"
token_key = classification + "_max_tokens"
Expand Down
18 changes: 9 additions & 9 deletions pipeline/specs/SKILL-MODEL-ROUTING-SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,16 +87,16 @@ tiers:
cost_ratio: 1 # Baseline

sonnet:
claude_code: claude-sonnet-4-5
claude_code: claude-sonnet-4-6
codex: gpt-5.3-codex
description: "Analytical tasks — classification, multi-factor decisions, standard coding"
cost_ratio: 8 # ~8x haiku
cost_ratio: 5 # ~5x haiku (Sonnet 4.6: $3/$15 per MTok)

opus:
claude_code: claude-opus-4-6
codex: gpt-5.3-codex-xl # hypothetical — map to best available
description: "Complex reasoning — debugging, architecture, novel problem solving"
cost_ratio: 60 # ~60x haiku
description: "Adversarial security analysis, formal verification, vulnerability chain synthesis"
cost_ratio: 25 # ~25x haiku (Opus: $15/$75 per MTok)

# Session defaults
defaults:
Expand Down Expand Up @@ -607,11 +607,11 @@ tiers:
claude_code: claude-haiku-4-5
cost_ratio: 1
sonnet:
claude_code: claude-sonnet-4-5
cost_ratio: 8
claude_code: claude-sonnet-4-6
cost_ratio: 5
opus:
claude_code: claude-opus-4-6
cost_ratio: 60
cost_ratio: 25

budget_zones:
yellow_threshold: 0.70
Expand Down Expand Up @@ -688,10 +688,10 @@ Use eval results to refine routing decisions:
│ Coordinator .............. haiku (routing only) │
│ Mechanical specialist .... haiku (tracing, matching) │
│ Analytical specialist .... sonnet (classification, code) │
│ Reasoning specialist ..... opus (debugging, architecture)
│ Reasoning specialist ..... opus (adversarial, formal)
│ │
│ COST RATIOS │
│ haiku = 1x | sonnet = ~8x | opus = ~60x
│ haiku = 1x | sonnet = ~5x | opus = ~25x
│ │
│ BUDGET ZONES │
│ Green (0-70%) ..... use preferred models │
Expand Down
9 changes: 7 additions & 2 deletions python-data-engineering/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
---
name: python-data-engineering
description: "Use this skill when writing Python code for data pipelines or transformations. Covers Polars, Pandas, PySpark DataFrames, dbt Python models, API extraction scripts, and data validation with Pydantic or Pandera. Common phrases: \"Polars vs Pandas\", \"PySpark DataFrame\", \"validate this data\", \"Python extraction script\". Do NOT use for SQL-based dbt models (use dbt-transforms) or integration architecture (use data-integration)."
model_tier: analytical
model:
preferred: sonnet
acceptable: [sonnet, opus]
minimum: sonnet
allow_downgrade: false
reasoning_demand: medium
version: 1.0.0
---

Expand All @@ -25,7 +30,7 @@ Activate when: choosing between DataFrame libraries, writing Polars/Pandas/PySpa

| reasoning_demand | preferred | acceptable | minimum |
|-----------------|-----------|------------|---------|
| medium | Sonnet | Opus, Haiku | Haiku |
| medium | Sonnet | Sonnet, Opus | Sonnet |

## Core Principles

Expand Down