Skip to content

Commit b4172a0

Browse files
authored
rfc: Update document with requested changes (#4)
* rfc: Update document with requested changes * Fix typo on line 28
1 parent 562d1a9 commit b4172a0

6 files changed

Lines changed: 27 additions & 5 deletions

File tree

active/0009-import-export-feature/README.md

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,15 @@ This feature as it is currently implemented is only intended to support:
2121

2222
## Non-Requirements
2323

24-
This feature is not intended to add a REST API for import/export like that of Collibra. It is only intended for use through the UI.
24+
This feature is not intended to add a REST API for import/export like that of Collibra. It is only intended for use through the UI. Additionally, we do not intend for this feature to be used to import datasets built from scratch. The feature is only intended to import CSV files that have been previously exported from DataHub.
2525

2626
## Detailed design
2727

28-
This feature will add three new options to the existing `SearchExtendedMenu` dropdown. One to export all datasets within a container, one to export individual datasets, and one to import previously exported data into DataHub. The export options create CSV files from data existing in DataHub, while the import option adds new data to DataHub from CSV files.
28+
This feature will add three new options to the existing `SearchExtendedMenu` dropdown, as can be seen in figure 1. The first option exports all datasets within a container, the second exports individual datasets, and the third is used to import previously exported data into DataHub. The export options create CSV files from data existing in DataHub, while the import option adds new data to DataHub from CSV files.
29+
30+
| ![Figure 1: Search extended menu](search_extended_menu.png "Figure 1") |
31+
|:--:|
32+
| *Figure 1: Search extended menu* |
2933

3034
Below is a list of the column names used in the CSV files for this feature. Within the CSV files, each row describes an individual dataset or schema field.
3135

@@ -49,15 +53,23 @@ Here is information on how these CSV columns are used, and how the data stored w
4953

5054
Within the `SearchExtendedMenu` dropdown, the container-level export option is only available when a container is being viewed. At all other times, it is grayed out and cannot be pressed. This is done using a React effect, which greys out the button unless the URL of the current page contains the word "container".
5155

52-
When either export option is selected, it opens a modal which prompts the user to enter the name of the CSV file to be created. For dataset-level export, the user is also prompted to enter the data source, database, schema, and table name of the dataset to be exported. Notably, these fields assume a specific number of containers to be present, which may not be the case for every data source. As such, this modal may need to be altered. This is what the fields presently refer to:
56+
When either export option is selected, it opens a modal which prompts the user to enter the name of the CSV file to be created (see figures 2 and 3). For dataset-level export, the user is also prompted to enter the data source, database, schema, and table name of the dataset to be exported. Notably, these fields assume a specific number of containers to be present, which may not be the case for every data source. As such, this modal may need to be altered. This is what the fields presently refer to:
5357
- Data source: The name of the data platform containing the dataset.
5458
- Database: A container representing a database within the data source.
5559
- Schema: A container representing a schema within the source database.
5660
- Table name: The name of the dataset.
5761

62+
| ![Figure 2: Dataset download modal](download_dataset_modal.png "Figure 2") |
63+
|:--:|
64+
| *Figure 2: Dataset download modal* |
65+
66+
| ![Figure 3: Schema download modal](download_schema_modal.png "Figure 3") |
67+
|:--:|
68+
| *Figure 3: Schema download modal* |
69+
5870
Upon entry, the following steps occur:
5971

60-
1. The modal is made invisible, but continues executing code for the export process. A notification is created to inform the user that the export process is ongoing.
72+
1. The modal is made invisible, but continues executing code for the export process. A notification is created to inform the user that the export process is ongoing (see figure 4).
6173
2. The URN of the dataset or container is determined, by either:
6274
- Pulling from [`EntityContext`](https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/app/entity/shared/EntityContext.ts) in the case of container-level export.
6375
- Manually constructing the URN from data entered into the modal in the case of dataset-level export.
@@ -67,13 +79,17 @@ Upon entry, the following steps occur:
6779
4. The metadata returned from the GraphQL query is transformed into a CSV-compatible JSON object using a shared function, `convertToCSVRows`. Each row in this JSON object contains the columns described in the prior section.
6880
5. The existing `downloadRowsAsCsv` function in [`csvUtils`](https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/app/search/utils/csvUtils.ts) is used to create the download.
6981

82+
| ![Figure 4: download notification](downloading_schema.png "Figure 4") |
83+
|:--:|
84+
| *Figure 4: Download notification* |
85+
7086
#### GraphQL queries
7187

7288
These GraphQL queries are used for container-level export and dataset-level export, respectively:
7389

7490
``` graphql
7591
query getDatasetByUrn($urn: String!, $start: Int!, $count: Int!) {
76-
search(input: { type: DATASET, query: $urn, start: $start, count: $count }) {
92+
search(input: { type: DATASET, query: "*", orFilters: [{and: [{field: "container", values: [$urn]}]}], start: $start, count: $count }) {
7793
start
7894
count
7995
total
@@ -259,6 +275,12 @@ In the case of import, the button first opens a prompt to upload a file, using t
259275
<input id="file" type="file" onChange={changeHandler} style={{ opacity: 0 }} />
260276
```
261277

278+
After the user has chosen a file for upload, a notification is shown to inform the user that the upload is in progress, as can be seen in figure 5.
279+
280+
| ![Figure 5: import notifications](import_notification.png "Figure 5") |
281+
|:--:|
282+
| *Figure 5: Import notifications* |
283+
262284
The `papaparse` library is used to parse the CSV file and iterate over each row present within it. The data is then fed into GraphQL mutations to create datasets. Notably, a new GraphQL mutation had to be created to allow the upserting of schema metadata. Here is the specification for that new mutation:
263285

264286
``` graphql
170 KB
Loading
141 KB
Loading
146 KB
Loading
568 KB
Loading
164 KB
Loading

0 commit comments

Comments
 (0)