From 1bcb440cd39e11011f3996987adcf6fbf02aa893 Mon Sep 17 00:00:00 2001 From: "Aaron (\"AJ\") Steers" Date: Tue, 30 Jul 2024 14:33:47 -0700 Subject: [PATCH] Docs: Clean up readme and module docs (#316) --- README.md | 61 ++---------- airbyte/__init__.py | 120 +++++++++++++++++++++++- airbyte/secrets/__init__.py | 63 ++++++++++++- CONTRIBUTING.md => docs/CONTRIBUTING.md | 4 +- docs/faq.md | 36 +++++++ pyproject.toml | 2 + 6 files changed, 228 insertions(+), 58 deletions(-) rename CONTRIBUTING.md => docs/CONTRIBUTING.md (94%) create mode 100644 docs/faq.md diff --git a/README.md b/README.md index 6c0aaed8..7817b528 100644 --- a/README.md +++ b/README.md @@ -5,19 +5,8 @@ PyAirbyte brings the power of Airbyte to every Python developer. PyAirbyte provi [![PyPI version](https://badge.fury.io/py/airbyte.svg)](https://badge.fury.io/py/airbyte) [![PyPI - Downloads](https://img.shields.io/pypi/dm/airbyte)](https://pypi.org/project/airbyte/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/airbyte)](https://pypi.org/project/airbyte/) - -[![PyPI - Wheel](https://img.shields.io/pypi/wheel/airbyte)](https://pypi.org/project/airbyte/) - -[![PyPI - Implementation](https://img.shields.io/pypi/implementation/airbyte)](https://pypi.org/project/airbyte/) -[![PyPI - Format](https://img.shields.io/pypi/format/airbyte)](https://pypi.org/project/airbyte/) [![Star on GitHub](https://img.shields.io/github/stars/airbytehq/pyairbyte.svg?style=social&label=★%20on%20GitHub)](https://github.com/airbytehq/pyairbyte) -- [Getting Started](#getting-started) -- [Secrets Management](#secrets-management) -- [Connector compatibility](#connector-compatibility) -- [Contributing](#contributing) -- [Frequently asked Questions](#frequently-asked-questions) - ## Getting Started Watch this [Getting Started Loom video](https://www.loom.com/share/3de81ca3ce914feca209bf83777efa3f?sid=8804e8d7-096c-4aaa-a8a4-9eb93a44e850) or run one of our Quickstart tutorials below to see how you can use PyAirbyte in your python code. @@ -29,62 +18,26 @@ Watch this [Getting Started Loom video](https://www.loom.com/share/3de81ca3ce914 * [GitHub](https://github.com/airbytehq/quickstarts/blob/main/pyairbyte_notebooks/PyAirbyte_Github_Incremental_Demo.ipynb) * [Postgres (cache)](https://github.com/airbytehq/quickstarts/blob/main/pyairbyte_notebooks/PyAirbyte_Postgres_Custom_Cache_Demo.ipynb) - -## Secrets Management - -PyAirbyte can auto-import secrets from the following sources: - -1. Environment variables. -2. Variables defined in a local `.env` ("Dotenv") file. -3. [Google Colab secrets](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75). -4. Manual entry via [`getpass`](https://docs.python.org/3.9/library/getpass.html). - -_Note: You can also build your own secret manager by subclassing the `CustomSecretManager` implementation. For more information, see the `airbyte.secrets.CustomSecretManager` class definiton._ - -### Retrieving Secrets - -```python -import airbyte as ab - -source = ab.get_source("source-github") -source.set_config( - "credentials": { - "personal_access_token": ab.get_secret("GITHUB_PERSONAL_ACCESS_TOKEN"), - } -) -``` - -By default, PyAirbyte will search all available secrets sources. The `get_secret()` function also accepts an optional `sources` argument of specific source names (`SecretSourceEnum`) and/or secret manager objects to check. - -By default, PyAirbyte will prompt the user for any requested secrets that are not provided via other secret managers. You can disable this prompt by passing `allow_prompt=False` to `get_secret()`. - -For more information, see the `airbyte.secrets` module. - -### Secrets Auto-Discovery - -If you have a secret matching an expected name, PyAirbyte will automatically use it. For example, if you have a secret named `GITHUB_PERSONAL_ACCESS_TOKEN`, PyAirbyte will automatically use it when configuring the GitHub source. - -The naming convention for secrets is as `{CONNECTOR_NAME}_{PROPERTY_NAME}`, for instance `SNOWFLAKE_PASSWORD` and `BIGQUERY_CREDENTIALS_PATH`. - -PyAirbyte will also auto-discover secrets for interop with hosted Airbyte: `AIRBYTE_CLOUD_API_URL`, `AIRBYTE_CLOUD_API_KEY`, etc. - ## Contributing -To learn how you can contribute to PyAirbyte, please see our [PyAirbyte Contributors Guide](./CONTRIBUTING.md). +To learn how you can contribute to PyAirbyte, please see our [PyAirbyte Contributors Guide](./docs/CONTRIBUTING.md). ## Frequently asked Questions **1. Does PyAirbyte replace Airbyte?** -No. +No. PyAirbyte is a Python library that allows you to use Airbyte connectors in Python, but it does not have orchestration +or scheduling capabilities, nor does is provide logging, alerting, or other features for managing pipelines in +production. Airbyte is a full-fledged data integration platform that provides connectors, orchestration, and scheduling capabilities. **2. What is the PyAirbyte cache? Is it a destination?** -Yes, you can think of it as a built-in destination implementation, but we avoid the word "destination" in our docs to prevent confusion with our certified destinations list [here](https://docs.airbyte.com/integrations/destinations/). +Yes and no. You can think of it as a built-in destination implementation, but we avoid the word "destination" in our docs to prevent confusion with our certified destinations list [here](https://docs.airbyte.com/integrations/destinations/). **3. Does PyAirbyte work with data orchestration frameworks like Airflow, Dagster, and Snowpark,** Yes, it should. Please give it a try and report any problems you see. Also, drop us a note if works for you! **4. Can I use PyAirbyte to develop or test when developing Airbyte sources?** -Yes, you can, but only for Python-based sources. +Yes, you can. PyAirbyte makes it easy to test connectors in Python, and you can use it to develop new local connectors +as well as existing already-published ones. **5. Can I develop traditional ETL pipelines with PyAirbyte?** Yes. Just pick the cache type matching the destination - like SnowflakeCache for landing data in Snowflake. diff --git a/airbyte/__init__.py b/airbyte/__init__.py index 8c2ae4bb..8e143693 100644 --- a/airbyte/__init__.py +++ b/airbyte/__init__.py @@ -1,9 +1,125 @@ # Copyright (c) 2024 Airbyte, Inc., all rights reserved. """PyAirbyte brings Airbyte ELT to every Python developer. -.. include:: ../README.md +PyAirbyte brings the power of Airbyte to every Python developer. PyAirbyte provides a set of +utilities to use Airbyte connectors in Python. -## API Reference +[![PyPI version](https://badge.fury.io/py/airbyte.svg)](https://badge.fury.io/py/airbyte) +[![PyPI - Downloads](https://img.shields.io/pypi/dm/airbyte)](https://pypi.org/project/airbyte/) +[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/airbyte)](https://pypi.org/project/airbyte/) +[![Star on GitHub](https://img.shields.io/github/stars/airbytehq/pyairbyte.svg?style=social&label=★%20on%20GitHub)](https://github.com/airbytehq/pyairbyte) + +# Getting Started + +## Reading Data + +You can connect to any of [hundreds of sources](https://docs.airbyte.com/integrations/sources/) +using the `get_source` method. You can then read data from sources using `Source.read` method. + +```python +from airbyte import get_source + +source = get_source( + "source-faker", + config={}, +) +read_result = source.read() + +for record in read_result["users"].records: + print(record) +``` + +For more information, see the `airbyte.sources` module. + +## Writing to SQL Caches + +Data can be written to caches using a number of SQL-based cache implementations, including +Postgres, BigQuery, Snowflake, DuckDB, and MotherDuck. If you do not specify a cache, PyAirbyte +will automatically use a local DuckDB cache by default. + +For more information, see the `airbyte.caches` module. + +## Writing to Destination Connectors + +Data can be written to destinations using the `Destination.write` method. You can connect to +destinations using the `get_destination` method. PyAirbyte supports all Airbyte destinations, but +Docker is required on your machine in order to run Java-based destinations. + +**Note:** When loading to a SQL database, we recommend using SQL cache (where available, +[see above](#writing-to-sql-caches)) instead of a destination connector. This is because SQL caches +are Python-native and therefor more portable when run from different Python-based environments which +might not have Docker container support. Destinations in PyAirbyte are uniquely suited for loading +to non-SQL platforms such as vector stores and other reverse ETL-type use cases. + +For more information, see the `airbyte.destinations` module and the full list of destination +connectors [here](https://docs.airbyte.com/integrations/destinations/). + +# PyAirbyte API + +## Importing as `ab` + +Most examples in the PyAirbyte documentation use the `import airbyte as ab` convention. The `ab` +alias is recommended, making code more concise and readable. When getting started, this +also saves you from digging in submodules to find the classes and functions you need, since +frequently-used classes and functions are available at the top level of the `airbyte` module. + +## Navigating the API + +While many PyAirbyte classes and functions are available at the top level of the `airbyte` module, +you can also import classes and functions from submodules directly. For example, while you can +import the `Source` class from `airbyte`, you can also import it from the `sources` submodule like +this: + +```python +from airbyte.sources import Source +``` + +Whether you import from the top level or from a submodule, the classes and functions are the same. +We expect that most users will import from the top level when getting started, and then import from +submodules when they are deploying more complex implementations. + +For quick reference, top-Level modules are listed in the left sidebar of this page. + +# Other Resources + +- [PyAirbyte GitHub Readme](https://github.com/airbytehq/pyairbyte) +- [PyAirbyte Issue Tracker](https://github.com/airbytehq/pyairbyte/issues) +- [Frequently Asked Questions](https://github.com/airbytehq/PyAirbyte/blob/main/docs/faq.md) +- [PyAirbyte Contributors Guide](https://github.com/airbytehq/PyAirbyte/blob/main/docs/CONTRIBUTING.md) +- [GitHub Releases](https://github.com/airbytehq/PyAirbyte/releases) + +---------------------- + +# API Reference + +Below is a list of all classes, functions, and modules available in the top-level `airbyte` +module. (This is a long list!) If you are just starting out, we recommend beginning by selecting a +submodule to navigate to from the left sidebar or from the list below: + +Each module +has its own documentation and code samples related to effectively using the related capabilities. + +- **`airbyte.cloud`** - Working with Airbyte Cloud, including running jobs remotely. +- **`airbyte.caches`** - Working with caches, including how to inspect a cache and get data from it. +- **`airbyte.datasets`** - Working with datasets, including how to read from datasets and convert to + other formats, such as Pandas, Arrow, and LLM Document formats. +- **`airbyte.destinations`** - Working with destinations, including how to write to Airbyte + destinations connectors. +- **`airbyte.documents`** - Working with LLM documents, including how to convert records into + document formats, for instance, when working with AI libraries like LangChain. +- **`airbyte.exceptions`** - Definitions of all exception and warning classes used in PyAirbyte. +- **`airbyte.experimental`** - Experimental features and utilities that do not yet have a stable + API. +- **`airbyte.records`** - Internal record handling classes. +- **`airbyte.results`** - Documents the classes returned when working with results from + `Source.read` and `Destination.write` +- **`airbyte.secrets`** - Tools for managing secrets in PyAirbyte. +- **`airbyte.sources`** - Tools for creating and reading from Airbyte sources. This includes + `airbyte.source.get_source` to declare a source, `airbyte.source.Source.read` for reading data, + and `airbyte.source.Source.get_records()` to peek at records without caching or writing them + directly. + +---------------------- """ diff --git a/airbyte/secrets/__init__.py b/airbyte/secrets/__init__.py index 156772df..6931d157 100644 --- a/airbyte/secrets/__init__.py +++ b/airbyte/secrets/__init__.py @@ -1,5 +1,66 @@ # Copyright (c) 2023 Airbyte, Inc., all rights reserved. -"""Secrets management for PyAirbyte.""" +"""Secrets management for PyAirbyte. + +PyAirbyte provides a secrets management system that allows you to securely store and retrieve +sensitive information. This module provides the secrets functionality. + +## Secrets Management + +PyAirbyte can auto-import secrets from the following sources: + +1. Environment variables. +2. Variables defined in a local `.env` ("Dotenv") file. +3. [Google Colab secrets](https://medium.com/@parthdasawant/how-to-use-secrets-in-google-colab-450c38e3ec75). +4. Manual entry via [`getpass`](https://docs.python.org/3.9/library/getpass.html). + +**Note:** You can also build your own secret manager by subclassing the `CustomSecretManager` +implementation. For more information, see the `airbyte.secrets.CustomSecretManager` reference docs. + +### Retrieving Secrets + +To retrieve a secret, use the `get_secret()` function. For example: + +```python +import airbyte as ab + +source = ab.get_source("source-github") +source.set_config( + "credentials": { + "personal_access_token": ab.get_secret("GITHUB_PERSONAL_ACCESS_TOKEN"), + } +) +``` + +By default, PyAirbyte will search all available secrets sources. The `get_secret()` function also +accepts an optional `sources` argument of specific source names (`SecretSourceEnum`) and/or secret +manager objects to check. + +By default, PyAirbyte will prompt the user for any requested secrets that are not provided via other +secret managers. You can disable this prompt by passing `allow_prompt=False` to `get_secret()`. + +### Secrets Auto-Discovery + +If you have a secret matching an expected name, PyAirbyte will automatically use it. For example, if +you have a secret named `GITHUB_PERSONAL_ACCESS_TOKEN`, PyAirbyte will automatically use it when +configuring the GitHub source. + +The naming convention for secrets is as `{CONNECTOR_NAME}_{PROPERTY_NAME}`, for instance +`SNOWFLAKE_PASSWORD` and `BIGQUERY_CREDENTIALS_PATH`. + +PyAirbyte will also auto-discover secrets for interop with hosted Airbyte: `AIRBYTE_CLOUD_API_URL`, +`AIRBYTE_CLOUD_API_KEY`, etc. + +## Custom Secret Managers + +If you need to build your own secret manager, you can subclass the +`airbyte.secrets.CustomSecretManager` class. This allows you to build a custom secret manager that +can be used with the `get_secret()` function, securely storing and retrieving secrets as needed. + +## API Reference + +_Below are the classes and functions available in the `airbyte.secrets` module._ + +""" from __future__ import annotations diff --git a/CONTRIBUTING.md b/docs/CONTRIBUTING.md similarity index 94% rename from CONTRIBUTING.md rename to docs/CONTRIBUTING.md index 566408f3..d0e16cd4 100644 --- a/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -16,9 +16,11 @@ Regular documentation lives in the `/docs` folder. Based on the doc strings of p To generate the documentation, run: ```console -poetry run generate-docs +poe generate-docs ``` +or `poetry run poe generate-docs` if you don't have [Poe](https://poethepoet.natn.io/index.html) installed. + The `generate-docs` CLI command is mapped to the `run()` function of `docs/generate.py`. Documentation pages will be generated in the `docs/generated` folder. The `test_docs.py` test in pytest will automatically update generated content. This updates must be manually committed before docs tests will pass. diff --git a/docs/faq.md b/docs/faq.md new file mode 100644 index 00000000..0b7c9b85 --- /dev/null +++ b/docs/faq.md @@ -0,0 +1,36 @@ +# PyAirbyte Frequently asked Questions + +**1. Does PyAirbyte replace Airbyte?** + +No. PyAirbyte is a Python library that allows you to use Airbyte connectors in Python but it does +not have orchestration or scheduling capabilities, nor does is provide logging, alerting, or other +features for managing data pipelines in production. Airbyte is a full-fledged data integration +platform that provides connectors, orchestration, and scheduling capabilities. + +**2. What is the PyAirbyte cache? Is it a destination?** + +Yes and no. You can think of it as a built-in destination implementation, but we avoid the word +"destination" in our docs to prevent confusion with our certified destinations list +[here](https://docs.airbyte.com/integrations/destinations/). + +**3. Does PyAirbyte work with data orchestration frameworks like Airflow, Dagster, and Snowpark, +etc.?** + +Yes, it should. Please give it a try and report any problems you see. Also, drop us a note if works +for you! + +**4. Can I use PyAirbyte to develop or test when developing Airbyte sources?** + +Yes, you can. PyAirbyte makes it easy to test connectors in Python, and you can use it to develop +new local connectors as well as existing already-published ones. + +**5. Can I develop traditional ETL pipelines with PyAirbyte?** + +Yes. Just pick the cache type matching the destination - like SnowflakeCache for landing data in +Snowflake. + +**6. Can PyAirbyte import a connector from a local directory that has python project files, or does +it have to be installed from PyPi?** + +Yes, PyAirbyte can use any local install that has a CLI - and will automatically find connectors b +name if they are on PATH. diff --git a/pyproject.toml b/pyproject.toml index 678c1386..804deb77 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -322,6 +322,8 @@ coverage-reset = { shell = "coverage erase" } check = { shell = "ruff check . && mypy . && pytest --collect-only -qq" } +docs-generate = {env = {PDOC_ALLOW_EXEC = "1"}, shell = "generate-docs && open docs/generated/index.html" } + fix = { shell = "ruff format . && ruff check --fix -s || ruff format ." } fix-unsafe = { shell = "ruff format . && ruff check --fix --unsafe-fixes . && ruff format ." } fix-and-check = { shell = "poe fix && poe check" }