Data Engineering Skills for Claude Code

Expert-level Claude Code skills for data engineering: dbt, Fivetran, Kafka, Airflow, Snowflake, and more.

This repository contains a curated suite of skills that enable Claude Code to provide expert guidance for data engineering workflows. Whether you're building data pipelines, modeling in dbt, integrating SaaS tools, or streaming events with Kafka, these skills help Claude understand your context and provide detailed, actionable guidance.

What problem does this solve? Data engineering involves many specialized tools (dbt, Fivetran, Kafka, Airflow, Snowflake, BigQuery, etc.) with deep best practices. These skills give Claude the domain expertise to help you make decisions, write code, debug issues, and architect solutions across the modern data stack.

What Are You Trying to Do?

Find your use case below and install the corresponding skill:

I want to...	Use this skill	Status
Connect Salesforce, NetSuite, Stripe, or HubSpot to my data warehouse	data-integration	Available
Set up Fivetran or Airbyte connectors	data-integration	Available
Build real-time streaming pipelines with Kafka or Flink	event-streaming	Available
Stream data into Snowflake, BigQuery, or Databricks	event-streaming	Available
Write dbt models, tests, and documentation	dbt-transforms	Available
Set up dbt CI/CD with slim CI and artifacts	dbt-transforms	Available
Optimize dbt performance (incremental models, materializations)	dbt-transforms	Available
Write Python for data engineering (dbt-py, PySpark, Pandas, API scripts)	python-data-engineering	Available
Design Airflow or Dagster DAGs	data-pipelines	Available
Schedule and monitor data pipelines	data-pipelines	Available
Use AI/LLMs in data workflows (embeddings, semantic search, MCP)	ai-data-integration	Available
Analyze local CSV/Excel/Parquet files with DuckDB	duckdb	Available
Run a data cleaning engagement for a client	client-delivery	Available
Build portable DLT pipelines from file sources	dlt-extract	Available
Generate zero-shot time-series forecasts with foundation models	tsfm-forecast	Available
Design a data testing strategy with SQL assertions and test reports	data-testing	Available
Implement data governance — cataloging, lineage, classification, access control	data-governance	Available
Set up data observability — freshness monitoring, alerting, incident response	data-observability	Available
Work with Azure Data Factory, Synapse, Fabric, or SQL Server	microsoft-data-stack	Available
Migrate SSIS packages to modern ETL (ADF, dbt, dlt)	microsoft-data-stack	Available
Choose between Snowflake, BigQuery, Databricks, DuckDB, Synapse, or SQL Server	shared-references/warehouse-comparison	Available
Implement data quality checks (freshness, completeness, accuracy)	shared-references/data-quality-patterns	Available

Don't see your use case? Check the full Catalog or open an issue to request a new skill.

Quick Install

Option 1: Install All Skills (Recommended for Platform Engineers)

git clone https://github.com/dtsong/data-engineering-skills
cd data-engineering-skills
./install.sh

This installs all available skills to ~/.claude/skills/data-engineering-skills/.

Option 2: Install by Role

./install.sh --role analytics-engineer
./install.sh --role data-platform-engineer
./install.sh --role integration-engineer
./install.sh --role ml-engineer

See Role-Based Presets below for what each role includes.

Option 3: Install Specific Skills

./install.sh --skills dbt-transforms,event-streaming
./install.sh --skills data-integration

Option 4: Manual Install

git clone https://github.com/dtsong/data-engineering-skills ~/.claude/skills/data-engineering-skills

Windows Support

Skills work on Windows via WSL (Windows Subsystem for Linux) or Git Bash:

WSL (recommended): Run ./install.sh from your WSL terminal. Skills install to ~/.claude/skills/ inside WSL.
Git Bash: Run ./install.sh from Git Bash. Paths map to your Windows home directory.
Skills are platform-agnostic: The skill files themselves are plain markdown and work on any OS. Only the install script requires a bash-compatible shell.

Native Windows CMD or PowerShell users should use WSL or Git Bash for the install script.

Update Existing Installation

cd ~/.claude/skills/data-engineering-skills
git pull

Or use the installer:

./install.sh --update

Role-Based Presets

Not sure which skills to install? We've created presets for common roles:

Role	Skills Installed	Description
analytics-engineer	dbt-transforms, python-data-engineering	Transform and model data using SQL and Python
data-platform-engineer	All skills	Full toolkit for building and maintaining data platforms
integration-engineer	data-integration, event-streaming, data-pipelines	Connect systems, orchestrate pipelines, handle real-time data
ml-engineer	python-data-engineering, ai-data-integration, tsfm-forecast	Python-first workflows, AI/ML pipelines, time-series forecasting
data-consultant	dbt-transforms, duckdb, client-delivery, dlt-extract, data-pipelines, data-testing	End-to-end data cleaning engagements
microsoft-data-engineer	microsoft-data-stack, data-pipelines, data-governance, dbt-transforms, data-observability	Azure/SQL Server data platform engineering

Example:

./install.sh --role analytics-engineer

This installs:

dbt-transforms (modeling, testing, CI/CD, performance)
python-data-engineering (dbt-py, Pandas, PySpark, API scripts)
Shared references (data-quality-patterns, warehouse-comparison)

How Skills Work

Skills are prompt templates that give Claude deep domain knowledge. Here's how they work:

Auto-activation: When you mention keywords like "dbt", "Fivetran", "Kafka", or "Airflow", Claude automatically loads the relevant skill.
Progressive disclosure: Skills provide core guidance first, then offer references for deep dives (e.g., "See dbt-testing-guide.md for 30+ test examples").
No manual activation needed: You don't need to explicitly invoke skills—just start asking questions.
Context-aware: Skills know when to activate based on your conversation, file context, and project structure.

Example conversation:

You: "Help me write a dbt staging model for Stripe charges"

Claude (dbt-transforms auto-activates): "I'll help you create a staging model following dbt best practices. Here's a model that handles Stripe's nested JSON structure and adds data quality tests..."

Suite Overview

Available Skills

Skill	Description	Lines	Files
dbt-transforms	dbt modeling, testing, incremental strategies, CI/CD, performance, governance	3,000	7
data-integration	Fivetran, Airbyte, API extraction, CDC, Reverse ETL, enterprise connectors	3,650	7
event-streaming	Kafka, Flink, Spark Streaming, warehouse streaming, event architectures	2,500	6

| data-pipelines | Dagster, Airflow, Prefect, scheduling, monitoring, consulting orchestration | 2,500 | 6 | | python-data-engineering | Polars, Pandas, PySpark, dbt Python models, API extraction, data validation | 2,700 | 6 | | ai-data-integration | MCP servers, NL-to-SQL, embeddings, LLM transforms | 1,750 | 5 | | duckdb | DuckDB local analysis, CSV/Excel/Parquet/JSON ingestion, profiling, export | 690 | 7 | | client-delivery | Engagement lifecycle, schema profiling, deliverables, client handoff | 760 | 7 | | dlt-extract | File-based DLT pipelines, destination swapping, schema contracts | 750 | 7 | | tsfm-forecast | Zero-shot time-series forecasting with TimesFM, Chronos, MOIRAI, Lag-Llama | 730 | 10 | | data-testing | Testing strategy, SQL assertions, pipeline validation, test-as-deliverable | 850 | 7 | | data-governance | Data cataloging, lineage, classification, access control, compliance | 1,110 | 8 | | data-observability | Freshness monitoring, volume anomaly detection, alerting, incident response | 1,160 | 8 | | microsoft-data-stack | ADF orchestration, Synapse/Fabric lakehouse, SQL Server CDC, SSIS migration, dbt-sqlserver | 1,200 | 4 |

Shared References

Reference	Description	Lines
data-quality-patterns	Tool-agnostic quality frameworks (four pillars, anomaly detection, alerting)	300
warehouse-comparison	Snowflake vs BigQuery vs Databricks vs DuckDB decision matrix	300
security-compliance-patterns	Three-tier security framework, credential management, data classification	300
security-tier-model	Consulting security tiers, ENGAGEMENT.yaml schema, tier transitions	300
dlt-vs-managed-connectors	DLT vs Fivetran vs Airbyte decision matrix	300

See full details in CATALOG.md.

Examples

Example 1: Setting up Fivetran for Salesforce

./install.sh --skills data-integration

Then ask Claude:

"How do I set up Fivetran to sync Salesforce to Snowflake with incremental updates?"

Skill activates: data-integration

Claude provides:

Fivetran connector setup steps
Schema mapping guidance
Incremental sync configuration
Data quality checks for Salesforce data
Common gotchas (API limits, field changes)

Example 2: Writing dbt Models with Tests

./install.sh --skills dbt-transforms

Then ask Claude:

"Help me write a dbt mart model that calculates customer lifetime value with data quality tests"

Skill activates: dbt-transforms

Claude provides:

Mart model structure following best practices
LTV calculation logic
Data quality tests (uniqueness, not-null, ranges)
Performance optimization (incremental strategy if needed)
Documentation template

Example 3: Building a Kafka Pipeline

./install.sh --skills event-streaming

Then ask Claude:

"How do I stream orders from PostgreSQL to BigQuery using Kafka Connect?"

Skill activates: event-streaming

Claude provides:

Kafka Connect source connector config (Debezium for PostgreSQL CDC)
Kafka Connect sink connector config (BigQuery)
Schema evolution handling
Monitoring and alerting setup
Error handling and dead letter queue configuration

Contributing

We welcome contributions! Here's how you can help:

Report Issues or Request Features

Open an issue at github.com/dtsong/data-engineering-skills/issues

Examples:

"Add Prefect guidance to data-pipelines"
"Include Polars examples in python-data-engineering"
"Bug: dbt incremental strategy example has incorrect syntax"

Submit Pull Requests

See CONTRIBUTING.md for guidelines on:

Adding new reference files
Improving existing skills
Creating new skills
Updating documentation

PR requirements:

Link to a GitHub issue (required)
Clear description of changes
Examples/tests if applicable
Follow existing structure and style

Share Feedback

Tell us how these skills are working for you! Open a discussion at github.com/dtsong/data-engineering-skills/discussions

Roadmap

Phase 1: Core Skills (Complete)

dbt-transforms
data-integration
event-streaming
Shared references (data-quality-patterns, warehouse-comparison)

Phase 2: Orchestration & Python (Complete)

data-pipelines (Airflow, Dagster, Prefect)
python-data-engineering (dbt-py, Pandas, PySpark, API scripts)

Phase 3: AI Data Integration (Complete)

ai-data-integration (AI agents, MCP, embeddings)

Phase 4: Data Consulting Extension (Complete)

duckdb (local data analysis, file ingestion, profiling)
client-delivery (engagement lifecycle, deliverables, client handoff)
dlt-extract (file-based DLT pipelines, destination swapping)
Consulting security tier model, scripts, and templates

Phase 5: Data Quality, Governance & Observability (Complete)

data-testing (testing strategy, SQL assertions, pipeline validation, test-as-deliverable)
data-governance (cataloging, lineage, classification, access control, compliance)
data-observability (freshness monitoring, volume anomaly detection, alerting, incident response)

License

Apache License 2.0

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

See LICENSE for full text.

Related Resources

Claude Code: github.com/anthropics/claude-code
dbt: docs.getdbt.com
Fivetran: fivetran.com/docs
Airbyte: docs.airbyte.com
Apache Kafka: kafka.apache.org/documentation
Apache Airflow: airflow.apache.org/docs
Dagster: docs.dagster.io
Snowflake: docs.snowflake.com
BigQuery: cloud.google.com/bigquery/docs
Databricks: docs.databricks.com
Azure Data Factory: learn.microsoft.com/azure/data-factory
Azure Synapse: learn.microsoft.com/azure/synapse-analytics
Microsoft Fabric: learn.microsoft.com/fabric

Questions?

Issues/Features: github.com/dtsong/data-engineering-skills/issues
Discussions: github.com/dtsong/data-engineering-skills/discussions
Email: Available in GitHub profile

Happy data engineering! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.githooks		.githooks
.github/workflows		.github/workflows
.vscode		.vscode
ai-data-integration		ai-data-integration
client-delivery		client-delivery
data-governance		data-governance
data-integration		data-integration
data-observability		data-observability
data-pipelines		data-pipelines
data-testing		data-testing
dbt-transforms		dbt-transforms
dlt-extract		dlt-extract
duckdb		duckdb
event-streaming		event-streaming
microsoft-data-stack		microsoft-data-stack
pipeline		pipeline
python-data-engineering		python-data-engineering
scripts		scripts
shared-references/data-engineering		shared-references/data-engineering
templates		templates
tsfm-forecast		tsfm-forecast
.gitignore		.gitignore
.governance-version		.governance-version
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CATALOG.md		CATALOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
install.sh		install.sh
renovate.json		renovate.json

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Skills for Claude Code

What Are You Trying to Do?

Quick Install

Option 1: Install All Skills (Recommended for Platform Engineers)

Option 2: Install by Role

Option 3: Install Specific Skills

Option 4: Manual Install

Windows Support

Update Existing Installation

Role-Based Presets

How Skills Work

Suite Overview

Available Skills

Shared References

Examples

Example 1: Setting up Fivetran for Salesforce

Example 2: Writing dbt Models with Tests

Example 3: Building a Kafka Pipeline

Contributing

Report Issues or Request Features

Submit Pull Requests

Share Feedback

Roadmap

Phase 1: Core Skills (Complete)

Phase 2: Orchestration & Python (Complete)

Phase 3: AI Data Integration (Complete)

Phase 4: Data Consulting Extension (Complete)

Phase 5: Data Quality, Governance & Observability (Complete)

License

Related Resources

Questions?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages