You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/dbt.md
+8-16Lines changed: 8 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -219,7 +219,7 @@ This section describes how to adapt dbt's incremental models to run on sqlmesh a
219
219
SQLMesh supports two approaches to implement [idempotent](../concepts/glossary.md#idempotency) incremental loads:
220
220
221
221
* Using merge (with the sqlmesh [`INCREMENTAL_BY_UNIQUE_KEY` model kind](../concepts/models/model_kinds.md#incremental_by_unique_key))
222
-
* Using insert-overwrite/delete+insert (with the sqlmesh [`INCREMENTAL_BY_TIME_RANGE` model kind](../concepts/models/model_kinds.md#incremental_by_time_range))
222
+
* Using [`INCREMENTAL_BY_TIME_RANGE` model kind](../concepts/models/model_kinds.md#incremental_by_time_range)
223
223
224
224
#### Incremental by unique key
225
225
@@ -233,28 +233,22 @@ To enable incremental_by_unique_key incrementality, the model configuration shou
233
233
234
234
#### Incremental by time range
235
235
236
-
To enable incremental_by_time_range incrementality, the model configuration should contain:
236
+
To enable incremental_by_time_range incrementality, the model configuration must contain:
237
237
238
-
* The `time_column` key with the model's time column field name as the value (see [`time column`](../concepts/models/model_kinds.md#time-column) for details)
239
238
* The `materialized` key with value `'incremental'`
240
-
* Either:
241
-
* The `incremental_strategy` key with value `'insert_overwrite'` or
242
-
* The `incremental_strategy` key with value `'delete+insert'`
243
-
* Note: in this context, these two strategies are synonyms. Regardless of which one is specified SQLMesh will use the [`best incremental strategy`](../concepts/models/model_kinds.md#materialization-strategy) for the target engine.
239
+
* The `incremental_strategy` key with the value `incremental_by_time_range`
240
+
* The `time_column` key with the model's time column field name as the value (see [`time column`](../concepts/models/model_kinds.md#time-column) for details)
244
241
245
242
### Incremental logic
246
243
247
-
SQLMesh requires a new jinja block gated by `{% if sqlmesh_incremental is defined %}`. The new block should supersede the existing `{% if is_incremental() %}` block and contain the `WHERE` clause selecting the time interval.
244
+
Unlike dbt incremental strategies, SQLMesh does not require the use of `is_incremental` jinja blocks to implement incremental logic.
245
+
Instead, SQLMesh provides predefined time macro variables that can be used in the model's SQL to filter data based on the time column.
248
246
249
247
For example, the SQL `WHERE` clause with the "ds" column goes in a new jinja block gated by `{% if sqlmesh_incremental is defined %}` as follows:
250
248
251
249
```bash
252
-
> {% if sqlmesh_incremental is defined %}
253
250
> WHERE
254
251
> ds BETWEEN '{{ start_ds }}' AND '{{ end_ds }}'
255
-
> {% elif is_incremental() %}
256
-
> ; < your existing is_incremental block >
257
-
> {% endif %}
258
252
```
259
253
260
254
`{{ start_ds }}`and `{{ end_ds }}` are the jinja equivalents of SQLMesh's `@start_ds` and `@end_ds` predefined time macro variables. See all [predefined time variables](../concepts/macros/macro_variables.md) available in jinja.
@@ -263,13 +257,11 @@ For example, the SQL `WHERE` clause with the "ds" column goes in a new jinja blo
263
257
264
258
SQLMesh provides configuration parameters that enable control over how incremental computations occur. These parameters are set in the model's `config` block.
265
259
266
-
The [`batch_size` parameter](../concepts/models/overview.md#batch_size) determines the maximum number of time intervals to run in a single job.
267
-
268
-
The [`lookback` parameter](../concepts/models/overview.md#lookback) is used to capture late arriving data. It sets the number of units of late arriving data the model should expect and must be a positive integer.
260
+
See [Incremental Model Properties](../concepts/models/overview.md#incremental-model-properties) for the full list of incremental model configuration parameters.
269
261
270
262
**Note:** By default, all incremental dbt models are configured to be [forward-only](../concepts/plans.md#forward-only-plans). However, you can change this behavior by setting the `forward_only: false` setting either in the configuration of an individual model or globally for all models in the `dbt_project.yaml` file. The [forward-only](../concepts/plans.md#forward-only-plans) mode aligns more closely with the typical operation of dbt and therefore better meets user's expectations.
271
263
272
-
Similarly, the [allow_partials](../concepts/models/overview.md#allow_partials) parameter is set to `true` by default for incremental dbt models unless the time column is specified, or the `allow_partials` parameter is explicitly set to `false` in the model configuration.
264
+
Similarly, the [allow_partials](../concepts/models/overview.md#allow_partials) parameter is set to `true` by default unless the `allow_partials` parameter is explicitly set to `false` in the model configuration.
|`type`| Engine type name - must be `trino`| string | Y |
87
-
|`user`| The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username. | string | Y |
88
-
|`host`| The hostname of your cluster. Don't include the `http://` or `https://` prefix. | string | Y |
89
-
|`catalog`| The name of a catalog in your cluster. | string | Y |
90
-
|`http_scheme`| The HTTP scheme to use when connecting to your cluster. By default, it's `https` and can only be `http` for no-auth or basic auth. | string | N |
91
-
|`port`| The port to connect to your cluster. By default, it's `443` for `https` scheme and `80` for `http`| int | N |
92
-
|`roles`| Mapping of catalog name to a role | dict | N |
93
-
|`http_headers`| Additional HTTP headers to send with each request. | dict | N |
94
-
|`session_properties`| Trino session properties. Run `SHOW SESSION` to see all options. | dict | N |
95
-
|`retries`| Number of retries to attempt when a request fails. Default: `3`| int | N |
96
-
|`timezone`| Timezone to use for the connection. Default: client-side local timezone | string | N |
|`type`| Engine type name - must be `trino`| string | Y |
87
+
|`user`| The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username. | string | Y |
88
+
|`host`| The hostname of your cluster. Don't include the `http://` or `https://` prefix. | string | Y |
89
+
|`catalog`| The name of a catalog in your cluster. | string | Y |
90
+
|`http_scheme`| The HTTP scheme to use when connecting to your cluster. By default, it's `https` and can only be `http` for no-auth or basic auth. | string | N |
91
+
|`port`| The port to connect to your cluster. By default, it's `443` for `https` scheme and `80` for `http`| int | N |
92
+
|`roles`| Mapping of catalog name to a role | dict | N |
93
+
|`http_headers`| Additional HTTP headers to send with each request. | dict | N |
94
+
|`session_properties`| Trino session properties. Run `SHOW SESSION` to see all options. | dict | N |
95
+
|`retries`| Number of retries to attempt when a request fails. Default: `3`| int | N |
96
+
|`timezone`| Timezone to use for the connection. Default: client-side local timezone | string | N |
97
+
|`schema_location_mapping`| A mapping of regex patterns to S3 locations to use for the `LOCATION` property when creating schemas. See [Table and Schema locations](#table-and-schema-locations) for more details. | dict | N |
98
+
|`catalog_type_overrides`| A mapping of catalog names to their connector type. This is used to enable/disable connector specific behavior. See [Catalog Type Overrides](#catalog-type-overrides) for more details. | dict | N |
97
99
98
100
## Table and Schema locations
99
101
@@ -204,6 +206,25 @@ SELECT ...
204
206
205
207
This will cause SQLMesh to set the specified `LOCATION` when issuing a `CREATE TABLE` statement.
206
208
209
+
## Catalog Type Overrides
210
+
211
+
SQLMesh attempts to determine the connector type of a catalog by querying the `system.metadata.catalogs` table and checking the `connector_name` column.
212
+
It checks if the connector name is `hive` for Hive connector behavior or contains `iceberg` or `delta_lake` for Iceberg or Delta Lake connector behavior respectively.
213
+
However, the connector name may not always be a reliable way to determine the connector type, for example when using a custom connector or a fork of an existing connector.
214
+
To handle such cases, you can use the `catalog_type_overrides` connection property to explicitly specify the connector type for specific catalogs.
215
+
For example, to specify that the `datalake` catalog is using the Iceberg connector and the `analytics` catalog is using the Hive connector, you can configure the connection as follows:
0 commit comments