DBT Airflow

DBT is a data transformation library it is not a orchistraion tool. As such it doesnt handle failing models.

The dbt_basic DAG shows how all models are built inside one task.

To optimise this we split each model and test for each model into individual tasks in dbt_advanced DAG.

With the DAG dbt_selectors_standard_schedule we go one step further and split the model up so we can run different parts of the DBT model at different times and intervals.

The DBT DAGs in this repository are built ontop of this blog post on beginner and advanced implementation concepts at the intersection of dbt and Airflow.

Airflow Setup

To run these DAGs locally:

Download the Astro CLI
Download and run Docker
Clone this repository and cd into it.
Run astro dev start to spin up a local Airflow environment and run the accompanying DAGs on your machine.

additional config

once airflow is running, add a dbt profile.yml to /home/astro/.dbt/profiles.yml
dbt's manifest.json needs to be built each time in airflow docker: /usr/local/airflow/data-cicd/target

DAGS

dbt_basic

runs some conditional logic to clone dbt repo
runs dbt without split each dbt model's build

dbt_advanced

uses the manifest.json to build out depencies as individual Airflow DAG tasks. Giving greater visability on errors, and bringing retry logic to dbt.

dbt_selectors_standard_schedule

This DAG receives all the DBT tasks based on DBT selectors docs this allows us to take dbt_advanced and break out the DBT model based on selectors. The benefit for this is if we need to run DAGs at different intervals and times.

CICD Tool

load manifest
generate_all_model_dependencies
pickle

Airflow DAG

loads pickle file based on selectors
generate DBT tasks based on selectors

CICD pickle manifest

This lives in "/home/dave/data-engineering/data-cicd/.github/workflows/dependency_graph.py"

Currently pickle file written out to "/home/dave/data-engineering/data-cicd/dbt_dags/data/*"

This file is a utility script that is run via CircleCI in the deploy step. It is not run via Airflow in any way. The point of this script is to generate a pickle file that contains all of the dependencies between dbt models for each dag (usually corresponding to a different schedule) that we want to run.

TODO dbt_selectors_standard_schedule

current depencies pickle live in include/data/ will need to move them into s3

great_expectations

example of using great expectations with local data

dbt_great_expectations

NOT working provides boiler plate for DBT with great expectations

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.astro		.astro
.vscode		.vscode
dags		dags
include		include
plugins		plugins
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBT Airflow

Airflow Setup

additional config

DAGS

dbt_basic

dbt_advanced

dbt_selectors_standard_schedule

CICD pickle manifest

TODO dbt_selectors_standard_schedule

great_expectations

dbt_great_expectations

About

Releases

Packages

Contributors 2

Languages

wisemuffin/airflow-dbt

Folders and files

Latest commit

History

Repository files navigation

DBT Airflow

Airflow Setup

additional config

DAGS

dbt_basic

dbt_advanced

dbt_selectors_standard_schedule

CICD pickle manifest

TODO dbt_selectors_standard_schedule

great_expectations

dbt_great_expectations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages