databrickslabs · yyoli-db · Nov 19, 2025 · Dec 3, 2025 · Dec 3, 2025 · Dec 3, 2025
@@ -18,6 +18,8 @@ This LLM-powered approach excels at converting complex SQL code and business log
 than syntactic transformation. While generated notebooks may require manual adjustments, they provide a valuable foundation
 for Databricks migration.
 
+Switch can also convert ETL workloads into Spark Declarative Pipelines, supporting both Python and SQL. Refer to the sections below for usage instructions.
+
 ---
 
 ## How Switch Works
@@ -36,6 +38,7 @@ Switch runs entirely within the Databricks workspace. You can find details about
 - **Jobs API**: Executes as scalable Databricks Jobs for batch processing
 - **Model Serving**: Direct integration with Databricks LLM endpoints, with concurrent processing for multiple files
 - **Delta Tables**: Tracks conversion progress and results
+- **Pipelines API**: Creates and Executes Spark Declarative Pipeline for pipeline conversion
 
 ### 3. Flexible Output Formats
 - **Notebooks**: Python notebooks containing Spark SQL (primary output)
@@ -98,6 +101,14 @@ Convert non-SQL files to notebooks or other formats.
 | `scala` | Scala Code → Databricks Python Notebook |
 | `airflow` | Airflow DAG → Databricks Jobs YAML + Operator conversion guidance (SQL→sql_task, Python→notebook, etc.) |
 
+### Built-in Prompts: ETL Sources
+
+Convert ETL workloads to Spark Declarative Pipeline (SDP) in Python or SQL.
+
+| Source Technology | Source → Target |
+|--------------|-----------------|
+| `pyspark` | PySpark ETL → Databricks Notebook in Python or SQL for SDP |
+
 ### Custom Prompts: Any Source Format
 
 Switch's LLM-based architecture supports additional conversion types through custom YAML conversion prompts, making it extensible beyond built-in options.
@@ -186,7 +197,7 @@ Additional conversion parameters are managed in the Switch configuration file. Y
 
 | Parameter | Description | Default Value | Available Options |
 |-----------|-------------|---------------|-------------------|
-| `target_type` | Output format type. `notebook` for Python notebooks with validation and error fixing, `file` for generic file formats. See [Conversion Flow Overview](#conversion-flow-overview) for processing differences. | `notebook` | `notebook`, `file` |
+| `target_type` | Output format type. `notebook` for Python notebooks with validation and error fixing, `file` for generic file formats, `sdp` for conversion from etl workloads to Spark Declarative Pipeline (SDP). See [Conversion Flow Overview](#conversion-flow-overview) for processing differences. | `notebook` | `notebook`, `file`, `sdp` |
 | `source_format` | Source file format type. `sql` performs SQL comment removal and whitespace compression preprocessing before conversion. `generic` processes files as-is without preprocessing. Preprocessing affects token counting and conversion quality. See [analyze_input_files](#analyze_input_files) for preprocessing details. | `sql` | `sql`, `generic` |
 | `comment_lang` | Language for generated comments. | `English` | `English`, `Japanese`, `Chinese`, `French`, `German`, `Italian`, `Korean`, `Portuguese`, `Spanish` |
 | `log_level` | Logging verbosity level. | `INFO` | `DEBUG`, `INFO`, `WARNING`, `ERROR` |
@@ -197,6 +208,7 @@ Additional conversion parameters are managed in the Switch configuration file. Y
 | `output_extension` | File extension for output files when `target_type=file`. Required for non-notebook output formats like YAML workflows or JSON configurations. See [File Conversion Flow](#file-conversion-flow) for usage examples. | `null` | Any extension (e.g., `.yml`, `.json`) |
 | `sql_output_dir` | (Experimental) When specified, triggers additional conversion of Python notebooks to SQL notebook format. This optional post-processing step may lose some Python-specific logic. See [convert_notebook_to_sql](#convert_notebook_to_sql-optional) for details on the SQL conversion process. | `null` | Full workspace path |
 | `request_params` | Additional request parameters passed to the model serving endpoint. Use for advanced configurations like extended thinking mode or custom token limits. See [LLM Configuration](/docs/transpile/pluggable_transpilers/switch/customizing_switch#llm-configuration) for configuration examples including Claude's extended thinking mode. | `null` | JSON format string (e.g., `{"max_tokens": 64000}`) |
+| `sdp_language` | Control the language of converted SDP code, can only be "python" or "sql". | `python` | `python`, `sql` |
 
 ---
 
@@ -316,7 +328,7 @@ flowchart TD
 
 ### Notebook Conversion Flow
 
-For `target_type=notebook`, the `orchestrate_to_notebook` orchestrator executes a comprehensive 7-step processing pipeline:
+For `target_type=notebook` or `target_type=sdp`, the `orchestrate_to_notebook` orchestrator executes a comprehensive 7-step processing pipeline:
 
 ```mermaid
 flowchart TD
@@ -325,8 +337,15 @@ flowchart TD
     subgraph processing ["Notebook Processing Workflow"]
         direction TB
         analyze[analyze_input_files] e2@==> convert[convert_with_llm]
-        convert e3@==> validate[validate_python_notebook]
-        validate e4@==> fix[fix_syntax_with_llm]
+
+        %% Branch: decide validation path
+        convert e8@==>|if SDP| validate_sdp[validate_sdp]
+        convert e3@==>|if NOT SDP| validate_nb[validate_python_notebook]
+
+        %% Downstream connections - both validations flow to fix_syntax
+        validate_nb e4@==> fix[fix_syntax_with_llm]
+        validate_sdp e9@==> fix
+
         fix e5@==> split[split_code_into_cells]
         split e6@==> export[export_to_notebook]
         export -.-> sqlExport["convert_notebook_to_sql<br>(Optional)"]
@@ -340,13 +359,22 @@ flowchart TD
 
     export e7@==> notebooks[Python Notebooks]
     sqlExport -.-> sqlNotebooks["SQL Notebooks<br>(Optional Output)"]
+
+    %% SDP validation pipeline operations 
+    validate_sdp -.-> export_notebook[export notebook]
+    export_notebook -.-> create_pipeline[create pipeline]
+    create_pipeline -.-> update_pipeline[update pipeline for validation]
+    update_pipeline -.-> delete_pipeline[delete pipeline]
+
     e1@{ animate: true }
     e2@{ animate: true }
     e3@{ animate: true }
     e4@{ animate: true }
     e5@{ animate: true }
     e6@{ animate: true }
     e7@{ animate: true }
+    e8@{ animate: true }
+    e9@{ animate: true }
 ```
 
 ### File Conversion Flow
@@ -394,6 +422,10 @@ Loads conversion prompts (built-in or custom YAML) and sends file content to the
 ### validate_python_notebook
 Performs syntax validation on the generated code. Python syntax is checked using `ast.parse()`, while SQL statements within `spark.sql()` calls are validated using Spark's `EXPLAIN` command. Any errors are recorded in the result table for potential fixing in the next step.
 
+### validate_sdp
+Performs Spark Declarative Pipeline validation on the generated code. A real pipeline is created and validation-only update is performed. 
+Note: `TABLE_OR_VIEW_NOT_FOUND` errors are ignored.
+
 ### fix_syntax_with_llm
 Attempts automatic error correction when syntax issues are detected. Sends error context back to the model serving endpoint, which suggests corrections. The validation and fix process repeats up to `max_fix_attempts` times (default: 1) until errors are resolved or the retry limit is reached.