You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: active/0009-import-export-feature/README.md
+27-5Lines changed: 27 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,11 +21,15 @@ This feature as it is currently implemented is only intended to support:
21
21
22
22
## Non-Requirements
23
23
24
-
This feature is not intended to add a REST API for import/export like that of Collibra. It is only intended for use through the UI.
24
+
This feature is not intended to add a REST API for import/export like that of Collibra. It is only intended for use through the UI. Additionally, we do not intend for this feature to be used to import datasets built from scratch. The feature is only intended to import CSV files that have been previously exported from DataHub.
25
25
26
26
## Detailed design
27
27
28
-
This feature will add three new options to the existing `SearchExtendedMenu` dropdown. One to export all datasets within a container, one to export individual datasets, and one to import previously exported data into DataHub. The export options create CSV files from data existing in DataHub, while the import option adds new data to DataHub from CSV files.
28
+
This feature will add three new options to the existing `SearchExtendedMenu` dropdown, as can be seen in figure 1. The first option exports all datasets within a container, the second exports individual datasets, and the third is used to import previously exported data into DataHub. The export options create CSV files from data existing in DataHub, while the import option adds new data to DataHub from CSV files.
Below is a list of the column names used in the CSV files for this feature. Within the CSV files, each row describes an individual dataset or schema field.
31
35
@@ -49,15 +53,23 @@ Here is information on how these CSV columns are used, and how the data stored w
49
53
50
54
Within the `SearchExtendedMenu` dropdown, the container-level export option is only available when a container is being viewed. At all other times, it is grayed out and cannot be pressed. This is done using a React effect, which greys out the button unless the URL of the current page contains the word "container".
51
55
52
-
When either export option is selected, it opens a modal which prompts the user to enter the name of the CSV file to be created. For dataset-level export, the user is also prompted to enter the data source, database, schema, and table name of the dataset to be exported. Notably, these fields assume a specific number of containers to be present, which may not be the case for every data source. As such, this modal may need to be altered. This is what the fields presently refer to:
56
+
When either export option is selected, it opens a modal which prompts the user to enter the name of the CSV file to be created (see figures 2 and 3). For dataset-level export, the user is also prompted to enter the data source, database, schema, and table name of the dataset to be exported. Notably, these fields assume a specific number of containers to be present, which may not be the case for every data source. As such, this modal may need to be altered. This is what the fields presently refer to:
53
57
- Data source: The name of the data platform containing the dataset.
54
58
- Database: A container representing a database within the data source.
55
59
- Schema: A container representing a schema within the source database.
1. The modal is made invisible, but continues executing code for the export process. A notification is created to inform the user that the export process is ongoing.
72
+
1. The modal is made invisible, but continues executing code for the export process. A notification is created to inform the user that the export process is ongoing (see figure 4).
61
73
2. The URN of the dataset or container is determined, by either:
62
74
- Pulling from [`EntityContext`](https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/app/entity/shared/EntityContext.ts) in the case of container-level export.
63
75
- Manually constructing the URN from data entered into the modal in the case of dataset-level export.
@@ -67,13 +79,17 @@ Upon entry, the following steps occur:
67
79
4. The metadata returned from the GraphQL query is transformed into a CSV-compatible JSON object using a shared function, `convertToCSVRows`. Each row in this JSON object contains the columns described in the prior section.
68
80
5. The existing `downloadRowsAsCsv` function in [`csvUtils`](https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/app/search/utils/csvUtils.ts) is used to create the download.
The `papaparse` library is used to parse the CSV file and iterate over each row present within it. The data is then fed into GraphQL mutations to create datasets. Notably, a new GraphQL mutation had to be created to allow the upserting of schema metadata. Here is the specification for that new mutation:
0 commit comments