You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Utility to Export Databricks files(non-data) and notebooks to Oracle AI Data Platform (AIDP)
4
+
* Preserves folder structure
5
+
* Converts notebooks to .ipynb
6
+
* No code translation; files are moved as-is
7
+
* Optional plain string replacement from a provided mapping. Replacement is simple find/replace (no parsing)
8
+
* This is not intended for Data Files.
9
+
10
+
Run this notebook from AIDP. The user must have read permission on Databricks Path and write permission on the AIDP destination path.
11
+
12
+
## Running the Samples
13
+
14
+
Before running the notebook, replace the following placeholders with your environment-specific values:
15
+
Required Parameters
16
+
17
+
DATABRICKS_WORKSPACE_URL: Your Databricks workspace URL
18
+
19
+
DATABRICKS_TOKEN: Your Databricks personal access token
20
+
21
+
DATABRICKS_PATH: Source path in Databricks workspace to export
22
+
23
+
AIDP_PATH: Target directory path in AIDP
24
+
25
+
dbx_to_aidp_replacement_mappings: Optional string-replacement map used during export. Basic example could be to rewrite path prefixes of referenced files/notebooks.
26
+
27
+
## Documentation
28
+
29
+
### Recursive Export
30
+
Traverses nested directory structures in Databricks workspace
31
+
### Format Preservation
32
+
Exports notebooks as Jupyter(.ipynb) and other files as is.
33
+
### String Replacement
34
+
Supports source-to-target string mapping during export
35
+
### Structure Maintenance
36
+
Recreates the original folder hierarchy in AIDP
37
+
### Multiple File Types
38
+
Handles both notebooks and regular files.
39
+
### No Code Conversion
40
+
It does not do any code conversion.
41
+
### Permissions
42
+
Need read permission on databricks and write permission on AIDP.
43
+
44
+
## Get Support
45
+
46
+
47
+
## Security
48
+
49
+
Please consult the [security guide](/SECURITY.md) for our responsible security vulnerability disclosure process.
50
+
51
+
## Contributing
52
+
53
+
This project welcomes contributions from the community. Before submitting a pull request, please [review our contribution guide](/CONTRIBUTING.md).
"source": "### Sample Code: Exporting Databricks Files to AIDP.\n\nThis example demonstrates how to export files recursively from databricks workspace using `databricks-sdk` Library and write to an **AIDP**.\n\n**Note:** \n\n- Replace all placeholders (e.g., `<DATABRICKS_WORKSPACE_URL>`, `<DATABRICKS_TOKEN>`, `<DATABRICKS_PATH>`, `<AIDP_PATH>` etc.) with values specific to your environment before running the notebook. \n- Provide Source to Target String replacement if you wish to do while importing to AIDP.\n- Use with caution: The notebook is designed for exporting notebooks & code related files only.",
"source": "#Databricks Workspace URL\ndatabricks_workspace_url = \"DATABRICKS_WORKSPACE_URL\"\n#Databricks Token\ndatabricks_token = \"DATABRICKS_TOKEN\"\n# Define the Databricks folder you want to export\ndatabricks_path = \"DATABRICKS_PATH\"\n# Define the local AIDP directory to write the exported content\naidp_path = \"AIDP_PATH\"",
48
+
"metadata": {
49
+
"type": "python",
50
+
"trusted": true
51
+
},
52
+
"outputs": [],
53
+
"execution_count": null
54
+
},
55
+
{
56
+
"id": "4b8ab52e-0885-4ae2-8864-5b7563d20b79",
57
+
"cell_type": "code",
58
+
"source": "#Provide Comma Seperated mapping to replace Source String with Target String. These are just string replacement so mapping should be provided carefully.\ndbx_to_aidp_replacement_mappings = {\n\"SOURCE_STR_1\": \"TARGET_STR_1\",\n\"SOURCE_STR_2\": \"TARGET_STR_2\"\n}",
59
+
"metadata": {
60
+
"type": "python",
61
+
"trusted": true
62
+
},
63
+
"outputs": [],
64
+
"execution_count": null
65
+
},
66
+
{
67
+
"id": "c9ae41ae-4e57-4fd9-8a59-f360a3cb60ad",
68
+
"cell_type": "code",
69
+
"source": "#Recursively exports a Databricks workspace folder to a local directory, preserving the nested folder structure and exporting notebooks as .ipynb files.\n\ndef export_folder_recursively(databricks_path: str , aidp_path: str , w: WorkspaceClient):\n\n try:\n # List contents of the current workspace path\n contents = w.workspace.list(path=databricks_path)\n except Exception as e:\n print(f\"Failed to list contents of Databricks path {databricks_path}: {e}\")\n return\n\n for item in contents:\n dbx_item_path = item.path\n\n # Determine the relative path to maintain the nested structure\n dbx_relative_path = os.path.relpath(dbx_item_path , databricks_path)\n aidp_full_path = os.path.join(aidp_path , dbx_relative_path)\n\n if item.object_type == workspace.ObjectType.DIRECTORY:\n # Create the local directory and recurse into it\n os.makedirs(aidp_full_path , exist_ok=True)\n print(f\"Created local directory: {aidp_full_path}\")\n export_folder_recursively(dbx_item_path , aidp_full_path , w)\n elif item.object_type == workspace.ObjectType.FILE or item.object_type == workspace.ObjectType.NOTEBOOK:\n file_name = os.path.basename(dbx_item_path)\n if item.object_type == workspace.ObjectType.NOTEBOOK:\n local_file_path = os.path.join(os.path.dirname(aidp_full_path) , f\"{file_name}.ipynb\")\n format = workspace.ExportFormat.JUPYTER\n else:\n local_file_path = os.path.join(os.path.dirname(aidp_full_path) , file_name)\n format = workspace.ExportFormat.SOURCE\n\n try:\n # Export the file/notebook content\n print(f\"Exporting File/Notebook: {dbx_item_path} to {local_file_path}\")\n dbx_file_content = w.workspace.export(\n path=dbx_item_path ,\n format=format\n )\n\n \n binary_content = base64.b64decode(dbx_file_content.content)\n code_string = binary_content.decode('utf-8')\n \n # Iterate through the mapping and replace content\n for dbx_str, aidp_str in dbx_to_aidp_replacement_mappings.items():\n code_string = code_string.replace(dbx_str, aidp_str)\n \n modified_binary_content = code_string.encode('utf-8')\n\n with open(local_file_path , \"wb\") as f:\n f.write(modified_binary_content)\n\n print(f\"Downloaded File: {file_name} as {local_file_path}\")\n\n except Exception as export_error:\n print(f\"Failed to export notebook {dbx_item_path}: {export_error}\")\n\n else:\n print(f\"Skipping unsupported object type: {item.object_type} at {dbx_item_path}\")",
70
+
"metadata": {
71
+
"type": "python",
72
+
"trusted": true
73
+
},
74
+
"outputs": [],
75
+
"execution_count": null
76
+
},
77
+
{
78
+
"id": "adaeed13-c355-4503-90bc-9aa8262c30cb",
79
+
"cell_type": "code",
80
+
"source": "# Initialize the WorkspaceClient\nw = WorkspaceClient(\n host=databricks_workspace_url ,\n token=databricks_token ,\n)\n\nprint(f\"Starting export from Databricks path '{databricks_path}' to local path '{aidp_path}'\")\n\n# Create AIDP local directory if not exists.\nos.makedirs(aidp_path , exist_ok=True)\n\n# Start the recursive export\nexport_folder_recursively(databricks_path , aidp_path , w)\n\nprint(\"\\nExport process finished.\")",
0 commit comments