Skip to content

Commit 1c82242

Browse files
authored
Add documentation for disabling snapshots (#115)
1 parent 6118558 commit 1c82242

File tree

3 files changed

+18
-1
lines changed

3 files changed

+18
-1
lines changed

artifacts.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,19 @@ db.save(wines, 'wines_2', 'replace')
3838
```
3939

4040
The specific requirements for each data system will vary based on what data types they accept and what metadata they need. To learn more, visit the documentation page for the data system you're using.
41+
42+
43+
## Disabling Snapshots on Artifacts
44+
By default, Aqueduct snapshots all artifact results during workflow runs. You can toggle this setting when [publishing the workflow](./workflows/creating-a-workflow.md#publishing-a-workflow) by setting the `disable_snapshots` flag to `True`. You can also call `.disable_snapshot()` on individual artifacts. For example:
45+
46+
```python
47+
import aqueduct as aq
48+
client = aq.Client()
49+
50+
db = client.resource('aqueduct_demo')
51+
wines = db.sql('SELECT * FROM wine;')
52+
wines.disable_snapshot() # This disables snapshots for wines
53+
54+
db.save(wines, 'wines_2', 'replace')
55+
```
56+
On an artifact with snapshotting disabled, you can call `enable_snapshot()` to reenable snapshots.

workflows/creating-a-workflow.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ There are a few key arguments here, and we'll go through the one by one:
8888
* You're, of course, welcome to list out all of the data artifacts in your workflow, but we figured it would be easier to list the outputs you care about.
8989
* `schedule`: This tells us how often you'd like to run your workflow. If you leave this empty, no schedule will be set, and you can set a schedule that executes as quickly as every minute or as rarely as every month. (See [managing-workflow-schedules.md](managing-workflow-schedules.md "mention") for more details.)
9090
* `config`: This tells us which connected engine you'd like to run your workflow. If you leave this empty, the workflow will be executed by Aqueduct engine by default. We currently support Airflow, Kubernetes, and Lambda.
91+
* `disable_snapshots`: This tells us if we should store snapshots of intermediate results. This disables all snapshots except for parameter, metrics, or checks. Additionally, you can toggle this behavior on individual [artifacts](../artifacts.md#disabling-snapshots-on-artifacts) by calling `.disable_snapshot()` on the artifact.
9192

9293
Finally, you'll notice that we print `flow.id()` at the end of our workflow. This shows you the UUID assigned to your workflow, which you can use to access the workflow from the Python SDK In the future.
9394

workflows/workflow-versions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Workflows are versioned on every execution. You can see previous versions on by
66

77
On every execution, Aqueduct automatically captures the code that was executed as well as _all the data that was generated by the workflow_ -- this includes intermediary data artifacts that aren't explicitly saved by the workflow. Aqueduct captures this information on each workflow run in order to help debug workflows, both within an individual execution and across time. 
88

9-
By default, the snapshots of intermediary data are stored on your local filesystem. **NOTE**: We are working on making this feature configurable.
9+
By default, the snapshots of intermediary data are stored on your local filesystem. You can toggle this settings when [publishing the workflow](./creating-a-workflow.md#publishing-a-workflow).
1010

1111
Of course, this data can quickly get out of hand, so Aqueduct also automatically prunes version history. The default setting maintains versions for 7 days. 
1212

0 commit comments

Comments
 (0)