Skip to content

Commit

Permalink
Add support for referencing dbt snapshots in an Ibis model (#15)
Browse files Browse the repository at this point in the history
* Adapt to new dbt_packages location in newer versions of dbt

* Add support for referencing snapshots

* Fix tests
  • Loading branch information
binste authored Sep 15, 2023
1 parent 04c8045 commit 23deecb
Show file tree
Hide file tree
Showing 10 changed files with 79 additions and 32 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
.user.yml
demo_project/jaffle_shop/db.duckdb
.vscode
dbt_packages

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def model(stores):
return stores.filter(stores["country"] == "USA")
```

Whenever your Ibis model references either a source, a seed, or a SQL model, you'll need to define the column data types as described in [Model Contracts - getdbt.com](https://docs.getdbt.com/docs/collaborate/govern/model-contracts) (`data_type` refers to the data types as they are called by your database system) (for sources and SQL models) or in [Seed configurations - getdbt.com](https://docs.getdbt.com/reference/seed-configs) (for seeds). If you reference another Ibis model, this is not necessary. In the examples above, you would need to provide it for the `stores` source table:
Whenever your Ibis model references either a source, a seed, a snapshot, or a SQL model, you'll need to define the column data types as described in [Model Contracts - getdbt.com](https://docs.getdbt.com/docs/collaborate/govern/model-contracts) (`data_type` refers to the data types as they are called by your database system) (for sources, snapshots, and SQL models) or in [Seed configurations - getdbt.com](https://docs.getdbt.com/reference/seed-configs) (for seeds). If you reference another Ibis model, this is not necessary. In the examples above, you would need to provide it for the `stores` source table:

```yml
sources:
Expand Down Expand Up @@ -84,7 +84,7 @@ You might want to configure your editor to treat `.ibis` files as normal Python

## Limitations
* There is no database connection available in the Ibis `model` functions. Hence, you cannot use Ibis functions which would require this.
* For non-Ibis models, seeds, and for sources, you need to specify the data types of the columns. See "Basic example" above.
* For non-Ibis models, seeds, snapshots, and for sources, you need to specify the data types of the columns. See "Basic example" above.

## Integration with DBT
There are [discussions](https://github.com/dbt-labs/dbt-core/pull/5274#issuecomment-1132772028) on [adding a plugin system to dbt](https://github.com/dbt-labs/dbt-core/issues/6184) which could be used to provide first-class support for other modeling languages such as Ibis (see [this PoC](https://github.com/dbt-labs/dbt-core/pull/6296) by dbt and the [discussion on Ibis as a dataframe API](https://github.com/dbt-labs/dbt-core/discussions/5738)) or PRQL (see [dbt-prql](https://github.com/PRQL/dbt-prql)).
Expand Down
14 changes: 11 additions & 3 deletions dbt_ibis/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,13 @@
from dbt.cli.main import cli, p, requires
from dbt.config import RuntimeConfig
from dbt.contracts.graph.manifest import Manifest
from dbt.contracts.graph.nodes import ColumnInfo, ModelNode, SeedNode, SourceDefinition
from dbt.contracts.graph.nodes import (
ColumnInfo,
ModelNode,
SeedNode,
SnapshotNode,
SourceDefinition,
)
from dbt.parser import manifest

_REF_IDENTIFIER_PREFIX: Final = "__ibd_ref__"
Expand All @@ -35,7 +41,7 @@
_IBIS_MODEL_FILE_EXTENSION: Final = "ibis"
_IBIS_SQL_FOLDER_NAME: Final = "__ibis_sql"

_RefLookup = dict[str, Union[ModelNode, SeedNode]]
_RefLookup = dict[str, Union[ModelNode, SeedNode, SnapshotNode]]
_SourcesLookup = dict[str, dict[str, SourceDefinition]]

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -399,7 +405,9 @@ def _extract_ref_and_source_infos(
dbt_manifest: Manifest,
) -> tuple[_RefLookup, _SourcesLookup]:
nodes = list(dbt_manifest.nodes.values())
models_and_seeds = [n for n in nodes if isinstance(n, (ModelNode, SeedNode))]
models_and_seeds = [
n for n in nodes if isinstance(n, (ModelNode, SeedNode, SnapshotNode))
]
ref_lookup = {m.name: m for m in models_and_seeds}

sources = dbt_manifest.sources.values()
Expand Down
3 changes: 2 additions & 1 deletion demo_project/jaffle_shop/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,9 @@ $ dbt-ibis debug
$ dbt-ibis seed
```

7. Run the models:
7. Run the snapshots and models:
```bash
$ dbt-ibis snapshot
$ dbt-ibis run
```

Expand Down
2 changes: 1 addition & 1 deletion demo_project/jaffle_shop/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ macro-paths: ["macros"]
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
- "dbt_packages"
- "logs"

require-dbt-version: [">=1.0.0", "<2.0.0"]
Expand Down
11 changes: 11 additions & 0 deletions demo_project/jaffle_shop/models/staging/stg_orders.ibis
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from dbt_ibis import depends_on, ref
import ibis.expr.types


# You can use a type hint to help your editor, e.g. VS Code, to provide
# you with autocompletion suggestions.
@depends_on(ref("orders_snapshot"))
def model(orders_snapshot: ibis.expr.types.Table):
return orders_snapshot.relabel({"id": "order_id", "user_id": "customer_id"}).select(
"order_id", "customer_id", "order_date", "status"
)
23 changes: 0 additions & 23 deletions demo_project/jaffle_shop/models/staging/stg_orders.sql

This file was deleted.

22 changes: 22 additions & 0 deletions demo_project/jaffle_shop/snapshots/orders_snapshot.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{% snapshot orders_snapshot %}

{#-
Normally we would select from the table here, but we are mostly using seeds to load
our data in this demo project
#}

{{
config(
target_database='db',
target_schema='snapshots',
unique_key='id',

strategy='check',
check_cols='all'
)
}}

select *
from {{ ref('raw_orders') }}

{% endsnapshot %}
21 changes: 21 additions & 0 deletions demo_project/jaffle_shop/snapshots/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: 2

snapshots:
- name: orders_snapshot
columns:
- name: id
data_type: integer
- name: user_id
data_type: integer
- name: order_date
data_type: date
- name: status
data_type: varchar
- name: dbt_scd_id
data_type: varchar
- name: dbt_updated_at
data_type: timestamp
- name: dbt_valid_from
data_type: timestamp
- name: dbt_valid_to
data_type: timestamp
10 changes: 8 additions & 2 deletions tests/test_dbt_ibis.py
Original file line number Diff line number Diff line change
Expand Up @@ -484,7 +484,7 @@ def execute_command(cmd: list[str]) -> None:

def validate_compiled_sql_files(project_dir: Path) -> list[Path]:
compiled_sql_files = get_compiled_sql_files(project_dir)
assert len(compiled_sql_files) == 4
assert len(compiled_sql_files) == 5

# Test content of some of the compiled SQL files
stg_stores = next(p for p in compiled_sql_files if p.stem == "stg_stores")
Expand Down Expand Up @@ -554,10 +554,15 @@ def get_tables() -> list[str]:
seed_tables = ["raw_orders", "raw_customers", "raw_payments"]
assert get_tables() == sorted(seed_tables)

execute_command(["dbt-ibis", "snapshot"])

snapshot_tables = ["orders_snapshot"]
assert get_tables() == sorted([*seed_tables, *snapshot_tables])

# Only run for a few models at first to make sure that --select
# is passed through to dbt run
execute_command(["dbt-ibis", "run", "--select", "stg_orders"])
assert get_tables() == sorted([*seed_tables, "stg_orders"])
assert get_tables() == sorted([*seed_tables, *snapshot_tables, "stg_orders"])

execute_command(
[
Expand All @@ -568,6 +573,7 @@ def get_tables() -> list[str]:
assert get_tables() == sorted(
[
*seed_tables,
*snapshot_tables,
"stg_orders",
"stg_customers",
"stg_payments",
Expand Down

0 comments on commit 23deecb

Please sign in to comment.