Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2166 databricks direct loading #2219

Merged
merged 28 commits into from
Feb 1, 2025
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9d560d9
databricks: enable local files
donotpush Jan 15, 2025
902c49d
fix: databricks test config
donotpush Jan 15, 2025
1efe565
work in progress
donotpush Jan 15, 2025
b60b3d3
added create and drop volume to interface
donotpush Jan 16, 2025
e772d20
refactor direct load authentication
donotpush Jan 20, 2025
2bd0be0
fix databricks volume file name
donotpush Jan 20, 2025
7641bcf
refactor databricks direct loading
donotpush Jan 22, 2025
627b985
format and lint
donotpush Jan 22, 2025
91c0028
revert config.toml changes
donotpush Jan 22, 2025
de29126
force notebook auth
donotpush Jan 23, 2025
d288f11
enhanced config validations
donotpush Jan 23, 2025
37ca7f4
force exception
donotpush Jan 23, 2025
e89548f
fix config resolve
donotpush Jan 23, 2025
c423929
remove imports
donotpush Jan 24, 2025
f0c7208
test: config exceptions
donotpush Jan 24, 2025
1271e22
restore comments
donotpush Jan 24, 2025
aec0d45
restored destination_config
donotpush Jan 24, 2025
3700850
fix pokema api values
donotpush Jan 24, 2025
730ff47
enables databricks no stage tests
rudolfix Jan 29, 2025
acece59
fix databricks config on_resolved
donotpush Jan 29, 2025
7de861c
adjusted direct load file management
donotpush Jan 29, 2025
0aab8d4
direct load docs
donotpush Jan 29, 2025
4998163
filters by bucket when subset of destinations is set when creating te…
rudolfix Jan 30, 2025
799c41d
simpler file upload
donotpush Jan 30, 2025
9ba1801
fix comment
donotpush Jan 30, 2025
2c54ed9
passes authentication directly from workspace, adds proper fingerprin…
rudolfix Jan 30, 2025
18b2bd8
use real client_id in tests
rudolfix Jan 31, 2025
428c075
fixes config resolver to not pass NotResolved hints to config providers
rudolfix Feb 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
restore comments
donotpush committed Jan 24, 2025
commit 1271e221208093bc09941d47ada955d6c6728b54
10 changes: 7 additions & 3 deletions dlt/destinations/impl/databricks/databricks.py
Original file line number Diff line number Diff line change
@@ -67,7 +67,7 @@ def run(self) -> None:
self._handle_staged_file()
)

# Determine the source format and any additional format options
# decide on source format, file_name will either be a local file or a bucket path
source_format, format_options_clause, skip_load = self._determine_source_format(
file_name, orig_bucket_path
)
@@ -172,14 +172,16 @@ def _handle_staged_file(self) -> tuple[str, str, str, str]:
credentials_clause = ""

if self._job_client.config.is_staging_external_location:
# skip the credentials clause
# just skip the credentials clause for external location
# https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html#external-location
pass
elif self._job_client.config.staging_credentials_name:
# named credentials
# add named credentials
credentials_clause = (
f"WITH(CREDENTIAL {self._job_client.config.staging_credentials_name} )"
)
else:
# referencing an staged files via a bucket URL requires explicit AWS credentials
if bucket_scheme == "s3":
assert isinstance(staging_credentials, AwsCredentialsWithoutDefaults)
s3_creds = staging_credentials.to_session_credentials()
@@ -192,6 +194,7 @@ def _handle_staged_file(self) -> tuple[str, str, str, str]:
assert isinstance(
staging_credentials, AzureCredentialsWithoutDefaults
), "AzureCredentialsWithoutDefaults required to pass explicit credential"
# Explicit azure credentials are needed to load from bucket without a named stage
credentials_clause = f"""WITH(CREDENTIAL(AZURE_SAS_TOKEN='{staging_credentials.azure_storage_sas_token}'))"""
bucket_path = self.ensure_databricks_abfss_url(
bucket_path,
@@ -216,6 +219,7 @@ def _handle_staged_file(self) -> tuple[str, str, str, str]:
staging_credentials.azure_account_host,
)

# always add FROM clause
from_clause = f"FROM '{bucket_path}'"

return from_clause, credentials_clause, file_name, orig_bucket_path