Skip to content

Conversation

@yyoli-db
Copy link

@yyoli-db yyoli-db commented Dec 3, 2025

Changes

Update the documentation of switch for conversion to Spark Declarative Pipeline (SDP).
This will be depend on this PR, which changes switch: https://github.com/databrickslabs/switch/pull/46

What does this PR do?

Relevant implementation details

This is documentation update.

Caveats/things to watch out for when reviewing:

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge ...
  • ... +add your own

Tests

  • manually tested
  • added unit tests
  • added integration tests

@yyoli-db yyoli-db requested a review from a team as a code owner December 3, 2025 08:59
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.56%. Comparing base (a8f5b2c) to head (ba0662f).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2174   +/-   ##
=======================================
  Coverage   63.56%   63.56%           
=======================================
  Files         100      100           
  Lines        8503     8503           
  Branches      885      885           
=======================================
  Hits         5405     5405           
  Misses       2931     2931           
  Partials      167      167           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

✅ 51/51 passed, 4 flaky, 5m6s total

Flaky tests:

  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (20.818s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (24.774s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (23.413s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (3.996s)

Running from acceptance #3193


| Source Technology | Source → Target |
|--------------|-----------------|
| `pyspark` | PySpark ETL → Databricks Notebook in Python or SQL for SDP |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a brief note about unknown_etl? It would help clarify that ETL types other than those listed above also transpile to Databricks Notebook in Python or SQL for SDP.

For example:

ETL Type Output
unknown_etl Any other ETL → Databricks Notebook in Python or SQL for SDP

Also, just a thought: would it make more sense to rename unknown_etl to something like other_etl? It might be clearer for users. I understand this would require changes on the Switch side as well, so it's just a suggestion.

Comment on lines +363 to +367
%% SDP validation pipeline operations
validate_sdp -.-> export_notebook[export notebook]
export_notebook -.-> create_pipeline[create pipeline]
create_pipeline -.-> update_pipeline[update pipeline for validation]
update_pipeline -.-> delete_pipeline[delete pipeline]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Move SDP pipeline operations detail to Processing Steps section

The Notebook Conversion Flow diagram has become quite lengthy after adding validate_sdp support. The dotted lines showing the pipeline operations (export_notebook → create_pipeline → update_pipeline → delete_pipeline) make the diagram harder to follow at a glance.

Suggestion: Remove the SDP pipeline operation details from the main diagram and document them in the ### validate_sdp section under Processing Steps instead.

Current diagram includes:

%% SDP validation pipeline operations
validate_sdp -.-> export_notebook[export notebook]
export_notebook -.-> create_pipeline[create pipeline]
create_pipeline -.-> update_pipeline[update pipeline for validation]
update_pipeline -.-> delete_pipeline[delete pipeline]

Proposed change:

  1. Remove these 4 dotted lines from the Notebook Conversion Flow diagram
  2. Expand the ### validate_sdp section in Processing Steps with a table:
### validate_sdp
Performs Spark Declarative Pipeline validation on the generated code. The validation process executes these steps sequentially:

| Step | Description |
|------|-------------|
| Export Notebook | Writes the converted code to a temporary notebook in workspace |
| Create Pipeline | Creates a temporary Spark Declarative Pipeline referencing the notebook |
| Update Pipeline | Runs a validation-only update to check for SDP syntax errors |
| Delete Pipeline | Cleans up the temporary pipeline after validation |

Note: `TABLE_OR_VIEW_NOT_FOUND` errors are ignored during validation.

This approach:

  • Keeps the main flow diagram readable
  • Documents SDP details in a logical location (under the validate_sdp processing step)
  • Uses a simple table format instead of an additional mermaid diagram

Copy link
Contributor

@hiroyukinakazato-db hiroyukinakazato-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In docs/lakebridge/docs/transpile/pluggable_transpilers/switch/customizing_switch.mdx, the Conversion Result Table Schema is missing a column for SDP validation errors. Please add result_sdp_errors similar to the existing result_python_parse_error and result_sql_parse_errors columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants