Expert-level Claude Code skills for data engineering: dbt, Fivetran, Kafka, Airflow, Snowflake, and more.
This repository contains a curated suite of skills that enable Claude Code to provide expert guidance for data engineering workflows. Whether you're building data pipelines, modeling in dbt, integrating SaaS tools, or streaming events with Kafka, these skills help Claude understand your context and provide detailed, actionable guidance.
What problem does this solve? Data engineering involves many specialized tools (dbt, Fivetran, Kafka, Airflow, Snowflake, BigQuery, etc.) with deep best practices. These skills give Claude the domain expertise to help you make decisions, write code, debug issues, and architect solutions across the modern data stack.
Find your use case below and install the corresponding skill:
| I want to... | Use this skill | Status |
|---|---|---|
| Connect Salesforce, NetSuite, Stripe, or HubSpot to my data warehouse | data-integration | Available |
| Set up Fivetran or Airbyte connectors | data-integration | Available |
| Build real-time streaming pipelines with Kafka or Flink | event-streaming | Available |
| Stream data into Snowflake, BigQuery, or Databricks | event-streaming | Available |
| Write dbt models, tests, and documentation | dbt-transforms | Available |
| Set up dbt CI/CD with slim CI and artifacts | dbt-transforms | Available |
| Optimize dbt performance (incremental models, materializations) | dbt-transforms | Available |
| Write Python for data engineering (dbt-py, PySpark, Pandas, API scripts) | python-data-engineering | Available |
| Design Airflow or Dagster DAGs | data-pipelines | Available |
| Schedule and monitor data pipelines | data-pipelines | Available |
| Use AI/LLMs in data workflows (embeddings, semantic search, MCP) | ai-data-integration | Available |
| Analyze local CSV/Excel/Parquet files with DuckDB | duckdb | Available |
| Run a data cleaning engagement for a client | client-delivery | Available |
| Build portable DLT pipelines from file sources | dlt-extract | Available |
| Generate zero-shot time-series forecasts with foundation models | tsfm-forecast | Available |
| Design a data testing strategy with SQL assertions and test reports | data-testing | Available |
| Implement data governance — cataloging, lineage, classification, access control | data-governance | Available |
| Set up data observability — freshness monitoring, alerting, incident response | data-observability | Available |
| Work with Azure Data Factory, Synapse, Fabric, or SQL Server | microsoft-data-stack | Available |
| Migrate SSIS packages to modern ETL (ADF, dbt, dlt) | microsoft-data-stack | Available |
| Choose between Snowflake, BigQuery, Databricks, DuckDB, Synapse, or SQL Server | shared-references/warehouse-comparison | Available |
| Implement data quality checks (freshness, completeness, accuracy) | shared-references/data-quality-patterns | Available |
Don't see your use case? Check the full Catalog or open an issue to request a new skill.
git clone https://github.com/dtsong/data-engineering-skills
cd data-engineering-skills
./install.shThis installs all available skills to ~/.claude/skills/data-engineering-skills/.
./install.sh --role analytics-engineer
./install.sh --role data-platform-engineer
./install.sh --role integration-engineer
./install.sh --role ml-engineerSee Role-Based Presets below for what each role includes.
./install.sh --skills dbt-transforms,event-streaming
./install.sh --skills data-integrationgit clone https://github.com/dtsong/data-engineering-skills ~/.claude/skills/data-engineering-skillsSkills work on Windows via WSL (Windows Subsystem for Linux) or Git Bash:
- WSL (recommended): Run
./install.shfrom your WSL terminal. Skills install to~/.claude/skills/inside WSL. - Git Bash: Run
./install.shfrom Git Bash. Paths map to your Windows home directory. - Skills are platform-agnostic: The skill files themselves are plain markdown and work on any OS. Only the install script requires a bash-compatible shell.
Native Windows CMD or PowerShell users should use WSL or Git Bash for the install script.
cd ~/.claude/skills/data-engineering-skills
git pullOr use the installer:
./install.sh --updateNot sure which skills to install? We've created presets for common roles:
| Role | Skills Installed | Description |
|---|---|---|
| analytics-engineer | dbt-transforms, python-data-engineering | Transform and model data using SQL and Python |
| data-platform-engineer | All skills | Full toolkit for building and maintaining data platforms |
| integration-engineer | data-integration, event-streaming, data-pipelines | Connect systems, orchestrate pipelines, handle real-time data |
| ml-engineer | python-data-engineering, ai-data-integration, tsfm-forecast | Python-first workflows, AI/ML pipelines, time-series forecasting |
| data-consultant | dbt-transforms, duckdb, client-delivery, dlt-extract, data-pipelines, data-testing | End-to-end data cleaning engagements |
| microsoft-data-engineer | microsoft-data-stack, data-pipelines, data-governance, dbt-transforms, data-observability | Azure/SQL Server data platform engineering |
Example:
./install.sh --role analytics-engineerThis installs:
- dbt-transforms (modeling, testing, CI/CD, performance)
- python-data-engineering (dbt-py, Pandas, PySpark, API scripts)
- Shared references (data-quality-patterns, warehouse-comparison)
Skills are prompt templates that give Claude deep domain knowledge. Here's how they work:
- Auto-activation: When you mention keywords like "dbt", "Fivetran", "Kafka", or "Airflow", Claude automatically loads the relevant skill.
- Progressive disclosure: Skills provide core guidance first, then offer references for deep dives (e.g., "See dbt-testing-guide.md for 30+ test examples").
- No manual activation needed: You don't need to explicitly invoke skills—just start asking questions.
- Context-aware: Skills know when to activate based on your conversation, file context, and project structure.
Example conversation:
You: "Help me write a dbt staging model for Stripe charges"
Claude (dbt-transforms auto-activates): "I'll help you create a staging model following dbt best practices. Here's a model that handles Stripe's nested JSON structure and adds data quality tests..."
| Skill | Description | Lines | Files |
|---|---|---|---|
| dbt-transforms | dbt modeling, testing, incremental strategies, CI/CD, performance, governance | 3,000 | 7 |
| data-integration | Fivetran, Airbyte, API extraction, CDC, Reverse ETL, enterprise connectors | 3,650 | 7 |
| event-streaming | Kafka, Flink, Spark Streaming, warehouse streaming, event architectures | 2,500 | 6 |
| data-pipelines | Dagster, Airflow, Prefect, scheduling, monitoring, consulting orchestration | 2,500 | 6 | | python-data-engineering | Polars, Pandas, PySpark, dbt Python models, API extraction, data validation | 2,700 | 6 | | ai-data-integration | MCP servers, NL-to-SQL, embeddings, LLM transforms | 1,750 | 5 | | duckdb | DuckDB local analysis, CSV/Excel/Parquet/JSON ingestion, profiling, export | 690 | 7 | | client-delivery | Engagement lifecycle, schema profiling, deliverables, client handoff | 760 | 7 | | dlt-extract | File-based DLT pipelines, destination swapping, schema contracts | 750 | 7 | | tsfm-forecast | Zero-shot time-series forecasting with TimesFM, Chronos, MOIRAI, Lag-Llama | 730 | 10 | | data-testing | Testing strategy, SQL assertions, pipeline validation, test-as-deliverable | 850 | 7 | | data-governance | Data cataloging, lineage, classification, access control, compliance | 1,110 | 8 | | data-observability | Freshness monitoring, volume anomaly detection, alerting, incident response | 1,160 | 8 | | microsoft-data-stack | ADF orchestration, Synapse/Fabric lakehouse, SQL Server CDC, SSIS migration, dbt-sqlserver | 1,200 | 4 |
| Reference | Description | Lines |
|---|---|---|
| data-quality-patterns | Tool-agnostic quality frameworks (four pillars, anomaly detection, alerting) | 300 |
| warehouse-comparison | Snowflake vs BigQuery vs Databricks vs DuckDB decision matrix | 300 |
| security-compliance-patterns | Three-tier security framework, credential management, data classification | 300 |
| security-tier-model | Consulting security tiers, ENGAGEMENT.yaml schema, tier transitions | 300 |
| dlt-vs-managed-connectors | DLT vs Fivetran vs Airbyte decision matrix | 300 |
See full details in CATALOG.md.
./install.sh --skills data-integrationThen ask Claude:
"How do I set up Fivetran to sync Salesforce to Snowflake with incremental updates?"
Skill activates: data-integration
Claude provides:
- Fivetran connector setup steps
- Schema mapping guidance
- Incremental sync configuration
- Data quality checks for Salesforce data
- Common gotchas (API limits, field changes)
./install.sh --skills dbt-transformsThen ask Claude:
"Help me write a dbt mart model that calculates customer lifetime value with data quality tests"
Skill activates: dbt-transforms
Claude provides:
- Mart model structure following best practices
- LTV calculation logic
- Data quality tests (uniqueness, not-null, ranges)
- Performance optimization (incremental strategy if needed)
- Documentation template
./install.sh --skills event-streamingThen ask Claude:
"How do I stream orders from PostgreSQL to BigQuery using Kafka Connect?"
Skill activates: event-streaming
Claude provides:
- Kafka Connect source connector config (Debezium for PostgreSQL CDC)
- Kafka Connect sink connector config (BigQuery)
- Schema evolution handling
- Monitoring and alerting setup
- Error handling and dead letter queue configuration
We welcome contributions! Here's how you can help:
Open an issue at github.com/dtsong/data-engineering-skills/issues
Examples:
- "Add Prefect guidance to data-pipelines"
- "Include Polars examples in python-data-engineering"
- "Bug: dbt incremental strategy example has incorrect syntax"
See CONTRIBUTING.md for guidelines on:
- Adding new reference files
- Improving existing skills
- Creating new skills
- Updating documentation
PR requirements:
- Link to a GitHub issue (required)
- Clear description of changes
- Examples/tests if applicable
- Follow existing structure and style
Tell us how these skills are working for you! Open a discussion at github.com/dtsong/data-engineering-skills/discussions
- dbt-transforms
- data-integration
- event-streaming
- Shared references (data-quality-patterns, warehouse-comparison)
- data-pipelines (Airflow, Dagster, Prefect)
- python-data-engineering (dbt-py, Pandas, PySpark, API scripts)
- ai-data-integration (AI agents, MCP, embeddings)
- duckdb (local data analysis, file ingestion, profiling)
- client-delivery (engagement lifecycle, deliverables, client handoff)
- dlt-extract (file-based DLT pipelines, destination swapping)
- Consulting security tier model, scripts, and templates
- data-testing (testing strategy, SQL assertions, pipeline validation, test-as-deliverable)
- data-governance (cataloging, lineage, classification, access control, compliance)
- data-observability (freshness monitoring, volume anomaly detection, alerting, incident response)
Apache License 2.0
Copyright 2026 Daniel Song
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
See LICENSE for full text.
- Claude Code: github.com/anthropics/claude-code
- dbt: docs.getdbt.com
- Fivetran: fivetran.com/docs
- Airbyte: docs.airbyte.com
- Apache Kafka: kafka.apache.org/documentation
- Apache Airflow: airflow.apache.org/docs
- Dagster: docs.dagster.io
- Snowflake: docs.snowflake.com
- BigQuery: cloud.google.com/bigquery/docs
- Databricks: docs.databricks.com
- Azure Data Factory: learn.microsoft.com/azure/data-factory
- Azure Synapse: learn.microsoft.com/azure/synapse-analytics
- Microsoft Fabric: learn.microsoft.com/fabric
- Issues/Features: github.com/dtsong/data-engineering-skills/issues
- Discussions: github.com/dtsong/data-engineering-skills/discussions
- Email: Available in GitHub profile
Happy data engineering! 🚀