diff --git a/DataPreparation.md b/DataPreparation.md index 3978d1b..94af0a9 100644 --- a/DataPreparation.md +++ b/DataPreparation.md @@ -1,4 +1,4 @@ -# Data Preparation process +# 5. Data Preparation process Data compliance with different tier levels can be performed progressively. For all three tiers, the process starts with the @@ -7,115 +7,15 @@ steps of de-identification and re-identification risk assessment, quality check and standardization. The details of the steps will be provided in the following sections, but the outline is the following: - ----- - - - - - - - - - - - - - - - - - - - - - - - - -
Requirement forDataset remains on -premisesDataset is exported to a -reference node
Tier 1 -compliance
    -
  • Dataset must be registered in the public catalogue.

  • -
  • Image and clinical data must be linked using a single, consistent -patient identifier (patientID), preserved across all preparation -steps.

  • -
  • No entity (e.g. patient, observation, study, series) may be -duplicated within the dataset.

  • -
    -
  • Dataset must be registered in the public catalogue.

  • -
  • Image and clinical data must be linked using a single, consistent -patient identifier (patientID), preserved across all preparation -steps.

  • -
  • No entity (e.g. patient, observation, study, series) may be -duplicated within the dataset.

  • -
  • De-identification and quality check is required prior to -transfer.

  • -
  • Imaging data must be accompanied by a set of minimum clinical -metadata. Only-imaging datasets, with imaging attributes only, will be -considered case-by-case before acceptance in the platform.

  • -
  • To transfer the data to a reference node, format for images -should be preferably DICOM objects. NIfTI could be also handled by both -reference nodes (add link to instructions as ref).

  • -
Tier 2 compliance
    -
  • Compliance with Tier 1 requirements

  • -
  • The metadata required for the federated search must be -standardized and semantically aligned with the EUCAIM -hyper-ontology.

  • -
  • Compliance with the EUCAIM Common Data Model (CDM) is -recommended but not mandatory. If the data is not -transformed to the EUCAIM CDM, you must instead implement a mapping -component that translates local data to the searchable variables -required by the federated search.

  • -
  • A query service component should be installed to run the -search.

  • -
    -
  • Compliance with Tier 1 requirements

  • -
  • The metadata required for the federated search must be -standardized and semantically aligned with the EUCAIM -hyper-ontology.

  • -
  • Compliance with the EUCAIM Common Data Model (CDM) is -recommended but not mandatory. If the data is not -transformed to the EUCAIM CDM, you must instead implement a mapping -component that translates local data to the searchable variables -required by the federated search.

  • -
Tier 3 compliance
    -
  • Compliance with Tier 1 and Tier 2 requirements

  • -
  • Provide imaging data in DICOM format; associated annotations and -segmentations, when available, must be in DICOM-SEG format. Exceptions -may be considered for diagnostic images in other formats, on a -case-by-case basis.

  • -
  • Full compliance with the EUCAIM Common Data Model (CDM) is -required.

  • -
  • Organize imaging and clinical data following the EUCAIM common -file structure.

  • -
  • Materialize imaging and clinical metadata according to the EUCAIM -CDM.

  • -
  • Data should be integrated into the materializer -component.

  • -
    -
  • Compliance with Tier 1 and Tier 2 requirements

  • -
  • Provide imaging data in DICOM format; associated annotations and -segmentations, when available, must be in DICOM-SEG format. Exceptions -may be considered for diagnostic images in other formats, on a -case-by-case basis.

  • -
  • Full compliance with the EUCAIM Common Data Model (CDM) is -required.

  • -
  • Organize imaging and clinical data following the EUCAIM common -file structure.

  • -
  • Materialize imaging and clinical metadata according to the EUCAIM -CDM.

  • -
  • Data should be integrated into the materializer -component.

  • -
- -Minimum metadata requirements for the imaging and -clinical data: + +| Requirement for | Dataset remains on premises | Dataset is exported to a reference node | +| ----------- | ----------- | ----------------- | +| **Tier 1 compliance** | | | +| **Tier 2 compliance** | | | +| **Tier 3 compliance** | | | + + +## **Minimum metadata requirements for the imaging and clinical data:** ### Minimum imaging attributes (from DICOM metadata) @@ -189,11 +89,10 @@ For negative screening/control groups, **region and laterality are not mandatory *Values should preferably be provided at the imaging level using DICOM tags. If identical for all studies, they may be provided once at dataset level.* -## **Data preparation and related tools from the EUCAIM catalogue** +## 5.1. Data preparation and related tools from the EUCAIM catalogue For the purpose of data preparation, several tools have been selected -and developed in EUCAIM. [Figure -7](https://eucaim.gitbook.io/handbook/datapreparation#fig_datatools) +and developed in EUCAIM. [Figure 7](#fig_datatools) shows the main tools selected for this phase. ***Use of EUCAIM-provided tools*** @@ -204,9 +103,10 @@ may choose to employ their own tools if they are more comfortable with them. The data preparation processes might slightly require different tools depending on their specific requirements and intended tier level. Please read the sections below carefully. EUCAIM -technical support team can assist you +technical support team can assist you throughout this process via the Helpdesk. +### | | | |---|---| | ![https://bio.tools/mitk](figures/mitk.png) | ![https://hub.docker.com/r/mariov687/dicomseg](figures/seg-convert.png) | @@ -216,8 +116,7 @@ throughout this process via the Helpdesk. | ![https://bio.tools/eetl_toolset](figures/etl.png) | ![https://bio.tools/data_integration_quality_check_tool_diqct](figures/diqct.png)| | ![https://bio.tools/image_duplicate_check_tool](figures/dupl-check-tool.png) | ![https://bio.tools/dicom_image_similarity-duplicate_checker](figures/dupl-check.png)| -[Figure -7](https://eucaim.gitbook.io/handbook/datapreparation#figur_datatools): +[Figure 7](#fig_datatools): EUCAIM data preparation tools for data holders. Click on the thumbnail for more information about the tool. @@ -236,46 +135,38 @@ The binaries of the tools can be downloaded from: #### Access to the EUCAIM Software artifacts registry (Harbor) -([https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories)) +([https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories)) The access to the registry requires a valid account and additional -permissions that can be requested on the first access to the registry. Instructions on how to request access and download tools are available [here -](https://drive.eucaim.cancerimage.eu/s/pxpTJWSTFsLbqPQ?dir=/&editing=false&openfile=true)\. +permissions that can be requested on the first access to the registry. Instructions on how to request access and download tools are available [here](https://drive.eucaim.cancerimage.eu/s/pxpTJWSTFsLbqPQ?dir=/&editing=false&openfile=true). It is advisable that once data holders request access to the registry, they open a ticket in the EUCAIM helpdesk - in the enrollment group - to speed up the process of approval (only data holders and project members can download the tools). -Below is a step-by-step guide on how to access +Below is a step-by-step guide on how to access the Harbor repository and download the required tools. - - - #### Access to the EUCAIM drive repository -([https://drive.eucaim.cancerimage.eu/apps/files/files/1520?dir=/Applications](https://drive.eucaim.cancerimage.eu/apps/files/files/1520?dir=/Applications)) +([https://drive.eucaim.cancerimage.eu/apps/files/files/1520?dir=/Applications](https://drive.eucaim.cancerimage.eu/apps/files/files/1520?dir=/Applications)) -## **Tier 1 datasets** +## 5.2. Tier 1 datasets -### **Steps to prepare your Tier 1 dataset for transfer to a reference node** +### **Steps to prepare your Tier 1 dataset for transfer to a reference node** The preparation of your dataset will follow four steps – image annotation (optional), de-identification, data quality check, and data transfer – as described below: - +![Figure 8. Step-wise preparation of Tier 1 dataset to be transferred to a reference node.](figures/step-prep-Tier1.png) -**Figure 8**: Step-wise preparation of Tier 1 dataset -to be transferred to a reference node. #### **Step 1: Image annotation (optional)** You may want to annotate your imaging data to enrich the quality of your dataset. -Tools: We recommend using the [**MITK -(Medical Imaging Interaction Toolkit) -Workbench**](https://bio.tools/mitk), which ensures the output +Tools: We recommend using the [**MITK (Medical Imaging Interaction Toolkit) Workbench**](https://bio.tools/mitk), which ensures the output format will be in the required format to be compliant with EUCAIM. Using it would avoid the burden (and the risk) of additional conversion procedures. Data can be also annotated using the DICOM Viewers from @@ -283,8 +174,9 @@ reference node environments after transferring the data. **Format standardization (optional)**: it is recommended that your imaging raw data are in DICOM format, and that your annotations are in -DICOM-SEG.\ -Tools: If you have existing annotation files +DICOM-SEG. + +Tools: If you have existing annotation files that are not in DICOM-SEG, you may use the EUCAIM [**Annotation Seg converter**](https://hub.docker.com/r/mariov687/dicomseg) tool to convert them. @@ -292,7 +184,7 @@ convert them. #### **Step 2: De-identification** You must ensure that no identifiable information (direct or indirect) is -present in the dataset you will share (Figure 9). +present in the dataset you will share ([Figure 9](#fig_dataanon)). ***Important points to consider before de-identification*** @@ -302,16 +194,14 @@ preparing a tabular file associating StudyUIDs from DICOM images with corresponding clinical “episode” and “timepoint events”, in case the dataset contains multiple episode/timepoints. -Tools: This can be done using the [**DICOM -tags extractor**](https://bio.tools/dicom_tags_extractor) tool -(Figure 7). For more information, see further below section -[5.3.3.2](#bookmark=id.e3irrt7bxs08) Step 2 on imaging data +Tools: This can be done using the [**DICOM tags extractor**](https://bio.tools/dicom_tags_extractor) tool +([Figure 7](#fig_datatools)). For more information, see further below section +[Step 2](#step-2-imaging-correspondence-with-clinical-data) on imaging data preparation. If your imaging data are not already de-identified, you may use the -[**Lethe EUCAIM -Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/) -(Figure 7). In this case, you must ensure the following: +[**Lethe EUCAIM Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/) +([Figure 7](#fig_datatools)). In this case, you must ensure the following: - the patient ID linking clinical and imaging data must be identical and listed as the first variable in the clinical dataset for tabular data; @@ -321,51 +211,44 @@ Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/reposit - the tool requires as input the SITE_ID, the unique identifier of the data provider, which you can see in your user profile from the [EUCAIM Dashboard](https://dashboard.eucaim.cancerimage.eu/) - ([Figure](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon) - 9). In case your Life Science account is not + ([Figure 9](#fig_dataanon)). In case your Life Science account is not assigned to a known organization, then this will be empty and so you can create a ticket in the Helpdesk to request one; Special attention must be given to **embedded text** in images, which may contain patient-identifiable information, as well as **craniofacial images** that pose a risk of patient re-identification. You may need to -apply additional de-identification techniques to mitigate this risk.\ -Tools: Tools such as the [**DICOM defacing -anonymisation**](https://bio.tools/dicom_defacing_anonymation) tool -from the EUCAIM catalogue (Figure 7) may be used to remove facial +apply additional de-identification techniques to mitigate this risk. + +Tools: Tools such as the [**DICOM defacing anonymisation**](https://bio.tools/dicom_defacing_anonymation) tool +from the EUCAIM catalogue ([Figure 7](#fig_datatools)) may be used to remove facial features from your DICOM images. For 2D ultrasounds and mammography -**dataset**, you may use the [**Trace4MedicalImage -cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, that -detects and removes encapsulated text in DICOM files. [The Lethe -EUCAIM -Anonymizer](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer) +**dataset**, you may use the [**Trace4MedicalImage cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, that +detects and removes encapsulated text in DICOM files. [The Lethe EUCAIM Anonymizer](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer) tool also provides options to remove burned-in PHI pixel data from the images. **Re-identification risk assessment (optional)**: Even if no automatic re-identification risk analysis on a combination of clinical and imaging metadata is possible at this Tier, you should carefully assess that no -direct or indirect identifiers are present in your data.\ -Tools: For assessing the risk of +direct or indirect identifiers are present in your data. + +Tools: For assessing the risk of re-identification of patients based on your **imaging metadata** before sharing your dataset, you may use the [EUCAIM **Wizard tool**](https://bio.tools/eucaim_wizard_tool). Extraction of imaging -metadata to feed the wizard tool is possible by using the [**DICOM -tags extractor**](https://bio.tools/dicom_tags_extractor) tool -(Figure -[7](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon)). -You may also use the [ARX Anonymization -Tool](https://bio.tools/arx) to assess the re-identification risk of +metadata to feed the wizard tool is possible by using the [**DICOM tags extractor**](https://bio.tools/dicom_tags_extractor) tool +([Figure 7](#fig_datatools)). +You may also use the [ARX Anonymization Tool](https://bio.tools/arx) to assess the re-identification risk of your clinical metadata, but it requires the specification of the quasi-identifier attributes by the DH. In addition, the creation of generalization hierarchies is necessary if you want to perform a utility–risk trade-off analysis and apply appropriate risk-mitigation strategies. - +### -> **Figure 9: Retrieving SITE ID from the Dashboard.** +

Figure 9. Retrieving SITE ID from the Dashboard.

#### **Step 3: Data quality check** @@ -386,7 +269,7 @@ dataset is**: You may use dedicated tools to assess the degree of compliance of your dataset to these principles. -Tools: Some tools from the EUCAIM catalogue +Tools: Some tools from the EUCAIM catalogue can help you to assess the degree of compliance of your dataset to each EUCAIM DQ dimension: @@ -406,26 +289,24 @@ EUCAIM DQ dimension: #### **Step 4: Data transfer** Tier 1 datasets can either be transferred to a reference node, or remain -at your site. If your dataset remains on site, any -data users interested in your dataset (as per the information +at your site. If your dataset remains on site, any +data users interested in your dataset (as per the information found in the EUCAIM catalogue) will be put in direct contact with you. If you wish to transfer your dataset to a reference node, please refer to Section 6 of the Handbook for further information. -## **Tiers 2 & 3 datasets** +## 5.3. Tiers 2 & 3 datasets ### **EUCAIM Common Data Model and Hyperontology** -The [**EUCAIM Common Data -Model**](https://eucaim.gitbook.io/eucaim-common-data-model/1.-introduction) +The [**EUCAIM Common Data Model**](https://eucaim.gitbook.io/eucaim-common-data-model/1.-introduction) defines a standardized structure for representing clinical and imaging metadata across the EUCAIM platform. It ensures that data contributed by different partners can be understood and used in a consistent way. **Key features:** -- It is based on the conceptual model of [mCode - specification](https://ascopubs.org/doi/10.1200/CCI.20.00059) +- It is based on the conceptual model of [mCode specification](https://ascopubs.org/doi/10.1200/CCI.20.00059) - The current version of the EUCAIM CDM Data Dictionary is available [here](https://docs.google.com/spreadsheets/d/1ox9PdvfCDxpDmEnFzC1M6OFhUhXpjQzg/edit?usp=sharing&ouid=115998150174651530097&rtpof=true&sd=true). @@ -435,8 +316,7 @@ different partners can be understood and used in a consistent way. - Facilitates efficient querying, tool compatibility, and federated analysis and learning. -The [**EUCAIM** -**hyperontology**](https://hyperontology.eucaim.cancerimage.eu/) +The [**EUCAIM hyperontology**](https://hyperontology.eucaim.cancerimage.eu/) is a common semantic meta-model that supports and maintains semantic interoperability and ensures consistent mapping and harmonization with the EUCAIM CDM entities (tables and attributes). It provides rich @@ -468,7 +348,7 @@ for: The preparation of your dataset will follow the 7 steps as described above: - +![Figure 10. Steps recommended to prepare your Tier 2 or Tier 3 ](figures/step-prep-Tier2-3.png) #### **Step 1: Clinical data structuring** @@ -550,7 +430,7 @@ of variables available in your dataset. 3. Separate all episodes into different tabs as described above, except for Diagnosis that belongs to the Overarching episode. -Note: episodes may correspond to the following: Treatment, Progression, +\*Note: episodes may correspond to the following: Treatment, Progression, Relapse, Remission, Active Surveillance. 4. For each variable of your dataset, find the corresponding entity and @@ -581,7 +461,7 @@ be merged on both columns." Example: in the Overarching episode tab, column K, the “Histological type” variable strictly follows the SNOMEDCT standard; line 4 specifies -“SNOMEDCT”, and an example value is provided on line 5.\ +“SNOMEDCT”, and an example value is provided on line 5. Important: both information must be separated by a comma, without space - if the variable follows specific standard with in-house coding or @@ -623,7 +503,7 @@ the clinical information you provide, especially the timepoints of each episode, we need to retrieve the correspondence between each imaging study and each clinical episode. -***Before de-identification of your dataset\****, please create a +***Before de-identification of your dataset***, please create a tabular csv file that contains the following information: - **PatientID** - the exact one from your DICOM images (attribute @@ -632,68 +512,27 @@ tabular csv file that contains the following information: - **StudyUID** - the exact one from your DICOM images (attribute (0020,000D)) -\*Note : if your dataset is already anonymized, you can still use +\*Note: if your dataset is already anonymized, you can still use the DICOM tags extraction tool to provide the file, proceed with step 2 and skip step 3. It is important that you can still link the (anonymized) PatientID with the episodes and timepoints. -Tools: To assist you retrieving all PatientID -and StudyUID from your imaging dataset, you may use the [**DICOM tags -extractor tool**](https://bio.tools/dicom_tags_extractor) and its +Tools: To assist you retrieving all PatientID +and StudyUID from your imaging dataset, you may use the [**DICOM tags extractor tool**](https://bio.tools/dicom_tags_extractor) and its “dicom_tags_selection” script. A template csv input file called “imaging_studies_episodes.csv”, provided with the tool, allows to retrieve the following attributes from your imaging dataset (cf tool -documentation): PatientID, StudyUID, StudyDate, Study description (Table -4). - - ------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PatientID -(0010,0020)

StudyUID

-

(0020,000D)

StudyDate -(0008,0020)StudyDescription -(0008,1030)
ABC-0001031.2.824.0.2.3886579.08.383.1010.61352018-12-11Whole Body I-131 CT
ABC-0001031.2.824.0.2.4653289.08.563.1010.46792018-12-23Screening-Bilateral Mammography
ABC-0001031.2.824.0.2.06135249.08.647.2304.79612019-01-13I131 high dose
ABC-0001071.2.824.0.2.4862015.07.383.5623.68202017-05-17Bilat Mammography
- -**Table 4: Example output file of the dicom_tags_selection script.** The +documentation): PatientID, StudyUID, StudyDate, Study description [Table 4](#tab_dicom_tags_selection). + +### +| **PatientID (0010,0020)** | **StudyUID (0020,000D)** | **StudyDate (0008,0020)** | **StudyDescription (0008,1030)** | +| ------------------------- | ------------------------------------------------------------------- | ------------------------- | -------------------------------- | +| ABC-000103 | 1.2.824.0.2.3886579.08.383.1010.6135 | 2018-12-11 | Whole Body I-131 CT | +| ABC-000103 | 1.2.824.0.2.4653289.08.563.1010.4679 | 2018-12-23 | Screening-Bilateral Mammography | +| ABC-000103 | 1.2.824.0.2.06135249.08.647.2304.7961 | 2019-01-13 | I131 high dose | +| ABC-000107 | 1.2.824.0.2.4862015.07.383.5623.6820 | 2017-05-17 | Bilat Mammography | + +[Table 4](#tab_dicom_tags_selection): Example output file of the dicom_tags_selection script. The StudyDate, and StudyDescription in Study are provided for indication only, to guide you for the mapping of each study to each episode (see step 2). @@ -708,77 +547,25 @@ You then need to edit the output file by adding the “Episode” and - **Timepoint** - As there can be multiple imaging procedures per episode, please number all studies in ascending order (1, 2, 3,…). - Note : the numbering only concerns imaging procedures, not any other + \*Note: the numbering only concerns imaging procedures, not any other procedure in between. - -------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PatientID -(0010,0020)

StudyUID

-

(0020,000D)

StudyDate -(0008,0020)StudyDescription -(0008,1030)EpisodeImaging Timepoint
ABC-0001031.2.824.0.2.3886579.08.383.1010.61352018-12-11Whole Body I-131 CTDiagnosis1
ABC-0001031.2.824.0.2.4653289.08.563.1010.46792018-12-23Screening-Bilateral MammographyDiagnosis2
ABC-0001031.2.824.0.2.06135249.08.647.2304.79612019-01-13I131 high doseTreatment3
ABC-0001071.2.824.0.2.4862015.07.383.5623.68202017-05-17Bilat MammographyDiagnosis1
- -**Table 5: Example of edited file with correspondence between StudyUID -and both Episode and Timepoint.** The part in blue corresponds to the +### +| **PatientID (0010,0020)** | **StudyUID (0020,000D)** | **StudyDate (0008,0020)** | **StudyDescription (0008,1030)** | **Episode** | **Imaging Timepoint** | +| ------------------------- | ------------------------------------------------------------------- | ------------------------- | -------------------------------- | ----------- | --------------------- | +| ABC-000103 | 1.2.824.0.2.3886579.08.383.1010.6135 | 2018-12-11 | Whole Body I-131 CT | Diagnosis | 1 | +| ABC-000103 | 1.2.824.0.2.4653289.08.563.1010.4679 | 2018-12-23 | Screening-Bilateral Mammography | Diagnosis | 2 | +| ABC-000103 | 1.2.824.0.2.06135249.08.647.2304.7961 | 2019-01-13 | I131 high dose | Treatment | 3 | +| ABC-000107 | 1.2.824.0.2.4862015.07.383.5623.6820 | 2017-05-17 | Bilat Mammography | Diagnosis | 1 | + +[Table 5](#tab_correspond_studyid): Example of edited file with correspondence between StudyUID +and both Episode and Timepoint. The part in blue corresponds to the part edited manually by the data holder. #### **Step 3: image annotation (optional)** You may want to annotate your imaging data to enrich your dataset. We -recommend using the [**MITK (Medical Imaging Interaction Toolkit) -Workbench**](https://bio.tools/mitk) that ensures the output format +recommend using the [**MITK (Medical Imaging Interaction Toolkit) Workbench**](https://bio.tools/mitk) that ensures the output format will be in the required format to be compliant with EUCAIM. Using it would avoid the burden (and the risk) of additional conversion procedures. Data can be also annotated using the DICOM Viewers from @@ -786,40 +573,33 @@ reference nodes environments after transferring the data (Step 7). Your imaging raw data must be in DICOM and your annotations in DICOM-SEG format. If you have existing annotation files that are not in DICOM-SEG, -you may use the EUCAIM [**Annotation Seg -converter**](https://hub.docker.com/r/mariov687/dicomseg) tool to +you may use the EUCAIM [**Annotation Seg converter**](https://hub.docker.com/r/mariov687/dicomseg) tool to convert them. #### **Step 4: De-identification** You must ensure that no identifiable information (direct or indirect) is -present in the dataset you will share (**Figure 9**). +present in the dataset you will share ([Figure 9](#fig_dataanon)). -The official tool for de-identification in EUCAIM is [**Lethe EUCAIM -Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/). This tool ensures the specific PatientID code system. +The official tool for de-identification in EUCAIM is [**Lethe EUCAIM Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/). This tool ensures the specific PatientID code system. Even if you are already anonymizing data using your own methods, we strongly recommend using the EUCAIM tool. The main reasons are: - **Unique Patient ID Generation**: Lethe Anonymizer automatically assigns a hashed PatientID to each patient. This 32mechanism ensures that the PatientID remains unique across the entire EUCAIM ecosystem, preventing any ID collisions between different DHs. This hash is generated using two components: - The original Patient ID. - The specific SiteID of the Data Holder. - **How to obtain your SiteID**: The SiteID is a required input for Lethe and can be retrieved from your User Profile in the EUCAIM Dashboard (UUID). To access this, you must log in with your institutional account, which must be properly registered in LS-AAI. You have to coordinate with your local IT department to ensure your institution is correctly integrated into the LS-AAI system. Google accounts or similar can’t be used to retrieve this SiteID. -- **Synchronizing Clinical Data**. To ensure your clinical data matches the hashed PatientIDs generated for the DICOM images, you can provide a CSV file during the anonymization process. The only requirement is that the first column must be the original PatientID. Lethe will then output: +- **Synchronizing Clinical Data**. To ensure your clinical data matches the hashed PatientIDs generated for the DICOM images, you can provide a CSV file during the anonymization process. The only requirement is that the first column must be the original PatientID. Lthe will then output: - The anonymized DICOM images. - A modified CSV file where the original IDs are replaced by the new hashed IDs.” -([Figure -7](https://eucaim.gitbook.io/handbook/datapreparation#bookmark=kix.br72yai62sd4)). The use of [**Lethe EUCAIM -Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/) requires: +([Figure 7](#fig_datatools)). The use of [**Lethe EUCAIM Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/) requires: - The patient ID linking clinical and imaging data must be identical and - listed as the first variable in the clinical dataset for tabular data; - + listed as the first variable in the clinical dataset for tabular - Your raw imaging data are in DICOM format; - The tool requires as input the SITE_ID - (**[Figure](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon) - 9**), the unique identifier of the data provider, which is you can see - in your user profile from the [EUCAIM - Dashboard](https://dashboard.eucaim.cancerimage.eu/). In case your + ([Figure 9](#fig_dataanon)), the unique identifier of the data provider, which is you can see + in your user profile from the [EUCAIM Dashboard](https://dashboard.eucaim.cancerimage.eu/). In case your Life Science account is not assigned to a known organization, then this will be empty and so you can create a ticket in the Helpdesk to request one; @@ -828,29 +608,25 @@ Special attention should be given to **embedded text** in images, that may contain patient-identifiable information, as well as **skull and head images** that pose a risk of patient re-identification. You may need to apply additional de-identification techniques to mitigate this -risk.\ -Tools: Tools such as the [**DICOM defacing -anonymisation**](https://bio.tools/dicom_defacing_anonymation) tool -from the EUCAIM catalogue (Figure 7) may be used to remove facial +risk. + +Tools: Tools such as the [**DICOM defacing anonymisation**](https://bio.tools/dicom_defacing_anonymation) tool +from the EUCAIM catalogue ([Figure 7](#fig_datatools)) may be used to remove facial features from your DICOM images. For 2D ultrasounds and mammography -**dataset**, you may use the [**Trace4MedicalImage -cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, that -detects and removes encapsulated text in DICOM files. [The Lethe -EUCAIM -Anonymizer](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer) +**dataset**, you may use the [**Trace4MedicalImage cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, that +detects and removes encapsulated text in DICOM files. [The Lethe EUCAIM Anonymizer](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer) tool also provides options to remove burned-in PHI pixel data from the images. **Re-identification risk assessment for imaging and clinical data (optional)**: Before sharing your dataset, you should carefully assess -that no direct or indirect identifiers are present in your data.\ -Tools: Extraction of imaging metadata to feed -the wizard tool is possible by using the [**DICOM tags -extractor**](https://bio.tools/dicom_tags_extractor) tool (Figure -[7](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon)). +that no direct or indirect identifiers are present in your data. + +Tools: Extraction of imaging metadata to feed +the wizard tool is possible by using the [**DICOM tags extractor**](https://bio.tools/dicom_tags_extractor) tool ( +[Figure 7](#fig_datatools)). Based on the EUCAIM CDM structure, ready-to-use hierarchies can be -imported in the [EUCAIM **Wizard -tool**](https://bio.tools/eucaim_wizard_tool) to initiate an +imported in the [EUCAIM **Wizard tool**](https://bio.tools/eucaim_wizard_tool) to initiate an analysis that is specifically tailored to the vocabulary and classification used in EUCAIM clinical metadata as well. The process and rationale is identical to the imaging metadata risk analysis, but the @@ -862,7 +638,7 @@ clinical and imaging information independently will work cumulatively for the overall data value. You must ensure that no identifiable information (direct or indirect) is -present in the dataset you will share (Figure 9). +present in the dataset you will share ([Figure 9](#fig_dataanon)). #### **Step 5: Data quality assessment** @@ -881,26 +657,22 @@ dataset is**: - **Showing integrity**: absence of data value loss or corruption -Tools: You may use dedicated tools to assess +Tools: You may use dedicated tools to assess the degree of compliance of your dataset to these principles. Some tools from the EUCAIM catalogue can help you to do so: -- The [**DICOM File integrity - checker**](https://bio.tools/dicom_file_integrity_checker_by_gibi230) +- The [**DICOM File integrity checker**](https://bio.tools/dicom_file_integrity_checker_by_gibi230) can check the **accuracy** and **integrity** of your imaging dataset. - For 2D ultrasounds and/or mammography **datasets,** **validity** - assessment is possible using the [**Trace4MedicalImage - cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, + assessment is possible using the [**Trace4MedicalImage cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, that detects and removes encapsulated text in DICOM files. - **Uniqueness** can be addressed with two EUCAIM tools that search for - image duplicates: the [**Image duplicates - checker**](https://bio.tools/dicom_image_similarity-duplicate_checker), + image duplicates: the [**Image duplicates checker**](https://bio.tools/dicom_image_similarity-duplicate_checker), capable of detecting duplicate or visually similar DICOM series by that combining metadata analysis, hash-based comparison, and - pixel-level similarity metrics; the [**Image duplicate check - tool**](https://bio.tools/image_duplicate_check_tool), that + pixel-level similarity metrics; the [**Image duplicate check tool**](https://bio.tools/image_duplicate_check_tool), that detects duplicate DICOM images by analyzing pixel data. - The @@ -909,7 +681,6 @@ from the EUCAIM catalogue can help you to do so: for imaging and clinical data, such as its **completeness, uniqueness, validity, consistency, integrity.** -> · #### **Step 6: Data conversion to EUCAIM Common Data Model** @@ -923,14 +694,13 @@ a\) the mapping between the source metadata (clinical and imaging) and the EUCAIM CDM. b\) the actual transformation of all the clinical and imaging data to a -format compliant with the EUCAIM CDM through the use of the [**EUCAIM -ETL**](https://bio.tools/eetl_toolset). +format compliant with the EUCAIM CDM through the use of the [**EUCAIM ETL**](https://bio.tools/eetl_toolset). For your imaging dataset: > \- Fill in a tabular csv file with the correspondence between all the > possible values of SeriesDescription to the EUCAIM CDM standard -> vocabulary entries (Table 6). For all the SeriesDescription that you +> vocabulary entries [Table 6](#tab_correspond_series). For all the SeriesDescription that you > cannot map, keep the original values. They will serve to enrich the > EUCAIM CDM. > @@ -947,6 +717,7 @@ For your imaging dataset: > ingestion support team through the [EUCAIM > helpdesk](https://help.cancerimage.eu/). +### | **Source series Description** | **EUCAIM series description** | |---------------------------------------|-------------------------------| | AXIALT2TSE | T2 weighted | @@ -955,8 +726,8 @@ For your imaging dataset: | EP2D_DIFF_TRA_B50-1000_TRACEW_DFC_MIX | Diffusion weighted | | t2_tse_tra_p2_384ESTRICTO | T2 weighted | -**Table 6: Example of correspondence between the Series Description from -the source images and the Series Description from the EUCAIM standard.** +[Table 6](#tab_correspond_series): Example of correspondence between the Series Description from +the source images and the Series Description from the EUCAIM standard. The part in blue corresponds to the part edited manually by the data holder. See [**here**](https://docs.google.com/document/d/1mnTkf2fvERgaRyQPDFebZHLwB8aBRaIZRkwlMBr3ZXQ/edit?tab=t.0) @@ -983,13 +754,17 @@ is stored in its final destination, and proceed with the next steps. ## **Metadata registration in the public catalogue (mandatory)** -**I**n parallel to dataset preparation, the associated metadata must be +In parallel to dataset preparation, the associated metadata must be registered to the EUCAIM public catalogue. This can be done at any stage of dataset preparation, although we recommend doing it once the total -number of cases is final (e.g. after the data quality check). Table 5 +number of cases is final (e.g. after the data quality check). [Table 7](#tab_steps_meta_reg) below describes the steps to register your metadata. - +### +| Action | Description | Support | +| -------- | -------- | -------- | +| Provide the dataset's metadata in the spreadsheet template (Data Holder Template sheet) | The dataset schema can be downloaded from this [link](https://docs.google.com/spreadsheets/d/1cj6YzIAchHnEKlH612gO91WzHfEOB4TbwBrl9a0kgE0/edit?usp=sharing). In case of doubts with the terminology, use textual descriptions. | A helpdesk ticket on the category of catalogue. | +| Make a request of registry upload | Create a helpdesk ticket on the category catalogue, providing the spreadsheet file with the metadata information. The helpdesk team will contact you back informing if the dataset has been properly registered or requesting more information. | Same procedure | +| Verify the entries in the catalogue | Access the registry in the catalogue at the URL: https://catalogue.eucaim.cancerimage.eu/#/collection/ | Same procedure | -**Table 7**: Steps to submit the Metadata to the registry. +[Table 7](#tab_steps_meta_reg): Steps to submit the Metadata to the registry. diff --git a/DataSharingAnnex.md b/DataSharingAnnex.md index e949028..db895fe 100644 --- a/DataSharingAnnex.md +++ b/DataSharingAnnex.md @@ -1,406 +1,83 @@ +# B. Annex: Data sharing checklist + This section summarises in a comprehensive table all the actions to be performed in the case of Data Holders that will deploy a federated node. -

Initial Assessment

- - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Submit an application for data incorporation use cases. - - Complete the application for data incorporation use cases form for the Access Committee to evaluate the scientific relevance of your participation in EUCAIM as a data holder. - - External application form Stakeholders Data Holders -
- Complete the TIERs maturity level questionnaire. - - Complete it to assess the readiness and compliance of your datasets and categorize them according to their maturity level (TIER 1, 2, or 3). - - https://dashboard.eucaim.cancerimage.eu/tier-maturity-level-questionnaire -
- Complete the DW maturity level questionnaire (only for clinical sites, as hospitals) - - Complete it to determine the current state of the hospital's Data Warehouse preparedness and maturity. - - https://dashboard.eucaim.cancerimage.eu/data-warehouse-maturity-questionnaire -
+

Initial Assessment

+ +| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Submit an application for data incorporation use cases. | Complete the application for data incorporation use cases form for the Access Committee to evaluate the scientific relevance of your participation in EUCAIM as a data holder. | - If you are a EUCAIM partner: [Call for use cases from EUCAIM partners](https://eu.jotform.com/233524103677050).
- If you are a EUCAIM stakeholder: [External application form - Stakeholders - Data Holders](https://form.jotform.com/251552461162350).| | +| Complete the TIERs maturity level questionnaire. | Complete it to assess the readiness and compliance of your datasets and categorize them according to their maturity level (TIER 1, 2, or 3). | - [https://dashboard.eucaim.cancerimage.eu/tier-maturity-level-questionnaire](https://dashboard.eucaim.cancerimage.eu/tier-maturity-level-questionnaire) | +| Complete the DW maturity level questionnaire (only for clinical sites, as hospitals) | Complete it to determine the current state of the hospital's Data Warehouse preparedness and maturity. | - [https://dashboard.eucaim.cancerimage.eu/data-warehouse-maturity-questionnaire](https://dashboard.eucaim.cancerimage.eu/data-warehouse-maturity-questionnaire) | + +
+ +

Ethical and Legal

+ +| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Provide documentation | | - For more information see primarily the [Legal Handbook](https://docs.google.com/document/d/1U-RpFycjXEVP-4-l9ppveT654x78Dhlw/edit?tab=t.0), D4.4 [Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy) (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders))
- Find also here the template for the DPO report: [faq_DPO_template.docx](https://docs.google.com/document/d/1KHf1nlCxFB1BjhhQXHVo4zVSoOBorL_X/edit) | +| Data Sharing Agreement | Fill-in and sign the DSA | - [DSA](https://drive.google.com/file/d/1-UyQ02w0-shmRgQgp8L1ATWs1JEco3_Y/view?usp=drive_link) | +| Define especial Access Conditions | A Document to be signed by the Data User that indicates the conditions under the Data User can access the data. | - [Draft Template](https://drive.google.com/file/d/1UMdDF52mXGHNIL0GegzfyuSBVfKCIl7d/view) | + +
+ +

Preliminaries

+ +| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Contact point for the negotiation (Only in federated nodes) | The LS-AAl details of the data holder delegate who will interact with the Data User through the negotiator. | - [Registration of users in EUCAIM LS-AAI.](https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view) | +| Get Familiar with EUCAIM | | - [https://dashboard.eucaim.cancerimage.eu](https://dashboard.eucaim.cancerimage.eu)
- [https://eucaim.gitbook.io/end-user-guide](https://eucaim.gitbook.io/end-user-guide)
- [https://www.youtube.com/@EUCAIM](https://www.youtube.com/@EUCAIM)
- [https://training.eucaim.cancerimage.eu/](https://training.eucaim.cancerimage.eu/) | + +
+ +

Local Node (T1/2)

+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Setup your local node | Deploy a node to host data and services to reach the desired interoperability level. | - Section 3.7 in [D5.6](https://drive.google.com/file/d/1URY8jtofLQpokTh7Hzag2wFFV9r1d_fs/view?usp=sharing) | +| Set up of the local catalogue (optional) | Deployment of a local instance of the catalogue. | - [Gitlab repository](https://gitlab.com/radiology/infrastructure/studies/eucaim/molgenis-emx2-eucaim) | +| Request a EUCAIM User | Request a EUCAIM User in the Dashboard. | - [Registration of users in EUCAIM](https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view) | -
-

Ethical and Legal

- - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Provide documentation - -
    -
  • Proof of legal representation and legal basis if necessary.
  • -
  • A copy of a favorable ethical approval (if applicable).
  • -
  • A report from the DPO confirming legal compliance.
  • -
  • GDPR compliance.
  • -
  • Data Protection Impact Assessment (DPIA), if applicable.
  • -
  • Documents demonstrating the security of the information system.
  • -
  • Any documents required under your national legislation.
  • -
  • Evidence of an adequate anonymization/pseudonymization process that has been carried out.
  • -
  • Terms of Usage for the data.
  • -
-
- D4.4 Final rules for participation report (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders) -
- Data Sharing Agreement - - Fill-in and sign the DSA - - Draft DSA -
- Define especial Access Conditions - - A Document to be signed by the Data User that indicates the conditions under the Data User can access the data. - - Draft Template -
+
+

Data Preparation

-
-

Preliminaries

- - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Contact point for the negotiation (Only in federated nodes) - - The LS-AAl details of the data holder delegate who will interact with the Data User through the negotiator. - - Registration of users in EUCAIM LS-AAI. -
- Get Familiar with EUCAIM - -
    -
  • Follow the EUCAIM training material and brief documents.
  • -
  • Browse architecture and
  • -
  • Watch webinars and videos.
  • -
-
- https://dashboard.eucaim.cancerimage.eu
- https://eucaim.gitbook.io/end-user-guide
- https://www.youtube.com/@EUCAIM
- https://training.eucaim.cancerimage.eu -
+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Extract the Imaging and clinical data | Use your own tools to extract the Medical Images and the clinical data | N/A | +| Annotate the data (optional) | Use your own annotation tool or the one selected by EUCAIM (MITK). Convert the annotations into DICOM-SEG. | - [MITK (Medical Imaging Interaction Toolkit) Workbench](https://bio.tools/mitk)
- [DicomSeg converter](https://hub.docker.com/r/mariov687/dicomseg) | +| Data de-identification | Ensure that no identifiable information is present in the dataset. If your imaging data are not already de-identified, you may use the EUCAIM Anonymizer. | - [Lethe DICOM Anonymizer](https://bio.tools/eucaim_dicom_anonymizer) | +| Re-identification risk assessment (optional) | Assess the risk of re-identification of patients based on your imaging metadata by checking hidden DICOM Tags. | - [Wizard](https://bio.tools/eucaim_wizard_tool) | +| Data Quality assessment (optional) | You may check the accuracy and integrity of your imaging dataset. | - [DICOM File integrity checker](https://bio.tools/dicom_file_integrity_checker_by_gibi230) | +
-
-

Local Node (T1/2)

- - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Setup your local node - - Deploy a node to host data and services to reach the desired interoperability level. - - Section 3.7 in D5.6 -
- Set up of the local catalogue (optional) - - Deployment of a local instance of the catalogue. - - Gitlab repository -
- Request a EUCAIM User - - Request a EUCAIM User in the Dashboard. - - Registration of users in EUCAIM -
+

Catalogue Population

+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Create local catalogue (optional) | Data should follow the EUCAIM interoperability schema. | - [Sample file with the schema](https://docs.google.com/spreadsheets/d/19DDoFq-_Bj7wfEf5KjkISe13kS-W5EYQ/edit?usp=sharing&ouid=102741390744373897413&rtpof=true&sd=true)
- [End User Guide](https://github.com/EUCAIM/End-User-Guide/blob/main/6-UserGuide4Members) | +| Make a request for catalogue registration | Create a helpdesk ticket on the category catalogue, providing the link to the dataset in the local catalogue, if available, or the completed catalogue metadata spreadsheet. The helpdesk team will contact you back informing if the dataset has been properly registered or requesting more information. | -[https://help.cancerimage.eu](https://help.cancerimage.eu)
- [Catalogue metadata spreadsheet ](https://u.i3m.upv.es/9gx81) | -
-

Data Preparation

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Extract the Imaging and clinical data - - Use your own tools to extract the Medical Images and the clinical data - - N/A -
- Annotate the data (optional) - - Use your own annotation tool or the one selected by EUCAIM (MITK). Convert the annotations into DICOM-SEG. - - MITK (Medical Imaging Interaction Toolkit) Workbench
- DicomSeg converter -
- Data de-identification - - Ensure that no identifiable information is present in the dataset. If your imaging data are not already de-identified, you may use the EUCAIM Anonymizer. - - Lethe DICOM Anonymizer -
- Re-identification risk assessment (optional) - - Assess the risk of re-identification of patients based on your imaging metadata by checking hidden DICOM Tags. - - Wizard -
- Data Quality assessment (optional) - - You may check the accuracy and integrity of your imaging dataset. - - DICOM File integrity checker -
+
+

Federated Search

-
-

Catalogue Population

- - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Create local catalogue (optional) - - Data should follow the EUCAIM interoperability schema. - - Sample file with the schema
- End User Guide -
- Make a request for catalogue registration - - Create a helpdesk ticket on the category catalogue, providing the link to the dataset in the local catalogue, if available, or the completed catalogue metadata spreadsheet. The helpdesk team will contact you back informing if the dataset has been properly registered or requesting more information. - - https://help.cancerimage.eu
- Catalogue metadata spreadsheet. -
+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Definition of Mapping to CDM Template (optional) | A mapping template of the mandatory and other significant attributes to the CDM should be defined. | Tables 14 and 15 in [D5.6](https://drive.google.com/file/d/1URY8jtofLQpokTh7Hzag2wFFV9r1d_fs/view?usp=sharing) | +| ETL process | The ETL tool should be applied to map the clinical and imaging data to the CDM. | - [https://bio.tools/eetl_toolset](https://bio.tools/eetl_toolset) | +| Development of Mediator Component (optional) | Develop a mediator to connect the local searching API with the federated explorer. | - Section 5.2.1 Dataset in a Federated Node. subsection "Guidelines for creating a mapping component" in [D5.6](https://drive.google.com/file/d/1URY8jtofLQpokTh7Hzag2wFFV9r1d_fs/view?usp=sharing) | +| Deployment of search components. | Deploy the Beam Proxy and the Focus query dispatcher. | - [https://eucaim.gitbook.io/enduserguide/6-userguide4members](https://eucaim.gitbook.io/enduserguide/6-userguide4members) | +
-
-

Federated Search

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Definition of Mapping to CDM Template (optional) - - A mapping template of the mandatory and other significant attributes to the CDM should be defined. - - Tables 14 and 15 in D5.6 -
- ETL process - - The ETL tool should be applied to map the clinical and imaging data to the CDM. - - https://bio.tools/eetl_toolset -
- Development of Mediator Component (optional) - - Develop a mediator to connect the local searching API with the federated explorer. - - Section 5.2.1 Dataset in a Federated Node. subsection "Guidelines for creating a mapping component" in D5.6 -
- Deployment of search components. - - Deploy the Beam Proxy and the Focus query dispatcher. - - https//eucaim.gitbook.io/enduserguide/6-userguide4members -
+

Federated Processing

+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Request registration in the federated explorer | Request the connection of the central instance of the federated search through a ticket in the helpdesk. | - [https://help.cancerimage.eu](https://help.cancerimage.eu) | +| Deployment of the FEM client | Deploy the container to run the service to interact with the federated processing. | - [https://gitlab.bsc.es/fl/fem-client](https://gitlab.bsc.es/fl/fem-client) | +| Deploy a federated computing node | Request technical support to the technical team through the helpdesk. | - [https://help.cancerimage.eu/](https://help.cancerimage.eu/) | -
-

Federated Processing

- - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Request registration in the federated explorer - - Request the connection of the central instance of the federated search through a ticket in the helpdesk. - - https://help.cancerimage.eu -
- Deployment of the FEM client
- Deploy a federated computing node -
- Deploy the container to run the service to interact with the federated processing.
- Request technical support to the technical team through the helpdesk. -
- https://gitlab.bsc.es/fl/fem-client
- https://help.cancerimage.eu -
diff --git a/DataTransferAnnex.md b/DataTransferAnnex.md index 2f76d10..efd72a9 100644 --- a/DataTransferAnnex.md +++ b/DataTransferAnnex.md @@ -1,341 +1,65 @@ +# A. Annex: Data transfer checklist + This section summarises in a comprehensive table all the actions to be performed in the case of Data Holders that will upload their data into a reference node. +

Initial Assessment

+ +| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Submit an application for data incorporation use cases | Complete the application for data incorporation use cases form for the Access Committee to evaluate the scientific relevance of your participation in EUCAIM as a data holder. | - If you are a EUCAIM partner: [Call for use cases from EUCAIM partners](https://eu.jotform.com/233524103677050).
- If you are a EUCAIM stakeholder: [External application form - Stakeholders - Data Holders](https://form.jotform.com/251552461162350 ).| +| Complete the TIERs maturity level questionnaire | Complete it to assess the readiness and compliance of your datasets and categorize them according to their maturity level (TIER 1, 2, or 3). | - [TIERs maturity level questionnaire](https://dashboard.eucaim.cancerimage.eu/tier-maturity-level-questionnaire) | +| Complete the DW maturity level questionnaire (only for clinical sites, as hospitals) | Complete it to determine the current state of the hospital's Data Warehouse preparedness and maturity. | - [Data Warehouse maturity level questionnaire](https://dashboard.eucaim.cancerimage.eu/data-warehouse-maturity-questionnaire) | + +
+ +

Ethical and Legal

+ +| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Provide documentation | | - For more information see primarily the [Legal Handbook](https://docs.google.com/document/d/1U-RpFycjXEVP-4-l9ppveT654x78Dhlw/edit?tab=t.0), D4.4 [Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy) (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders))
- Find also here the template for the DPO report: [faq_DPO_template.docx](https://docs.google.com/document/d/1KHf1nlCxFB1BjhhQXHVo4zVSoOBorL_X/edit) | +| Data Transfer Agreement | Fill-in and sign the DTA | - [DTA](https://drive.google.com/file/d/1TTuaFo4cWwomLJBtQbs_lkrBNFVSLH_L/view) | + +
+ +

Preliminaries

-

Initial Assessment

- - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Submit an application for data incorporation use cases - - Complete the application for data incorporation use cases form for the Access Committee to evaluate the scientific relevance of your participation in EUCAIM as a data holder. - - External application form - Stakeholders - Data Holders -
- Complete the TIERs maturity level questionnaire - - Complete it to assess the readiness and compliance of your datasets and categorize them according to their maturity level (TIER 1, 2, or 3). - - https://dashboard.eucaim.cancerimage.eu/tier-maturity-level-questionnaire -
- Complete the DW maturity level questionnaire (only for clinical sites, as hospitals) - - Complete it to determine the current state of the hospital's Data Warehouse preparedness and maturity. - - https://dashboard.eucaim.cancerimage.eu/data-warehouse-maturity-questionnaire -
+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Get Familiar with EUCAIM | | - [https://dashboard.eucaim.cancerimage.eu](https://dashboard.eucaim.cancerimage.eu)
- [https://eucaim.gitbook.io/end-user-guide](https://eucaim.gitbook.io/end-user-guide)
- [https://www.youtube.com/@EUCAIM](https://www.youtube.com/@EUCAIM)
- [https://training.eucaim.cancerimage.eu/](https://training.eucaim.cancerimage.eu/) | +| Request a EUCAIM User | Request a EUCAIM User in the Dashboard through LS-AAI. | - [Registration of users in EUCAIM](https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view) | +
-
-

Ethical and Legal

- - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Provide documentation - -
    -
  • Proof of legal representation and legal basis if necessary.
  • -
  • A copy of a favorable ethical approval (if applicable).
  • -
  • A report from the DPO confirming legal compliance.
  • -
  • Any documents required under your national legislation
  • -
  • Terms of Usage for the data.
  • -
-
- D4.4 Final rules for participation report (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders) -
- Data Transfer Agreement - - Fill-in and sign the DTA - - - Draft DTA -
+

Data Preparation

+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Extract Imaging and clinical data | Use your own tools to extract the Medical Images and the clinical data | N/A | +| Annotate the data (optional) | Use your own annotation tool or the one selected by EUCAIM (MITK). Convert the annotations into DICOM-SEG. | - [MITK (Medical Imaging Interaction Toolkit) Workbench](https://bio.tools/mitk)
- [DicomSeg converter](https://hub.docker.com/r/mariov687/dicomseg) | +| Data de-identification | Ensure that no identifiable information is present in the dataset. If your imaging data are not already de-identified, you may use the Lethe DICOM Anonymizer | - [Lethe DICOM Anonymizer](https://bio.tools/eucaim_dicom_anonymizer) | +| Re-identification risk assessment (optional) | Assess the risk of re-identification of patients based on your imaging metadata by checking hidden DICOM Tags. | - [Wizard](https://bio.tools/eucaim_wizard_tool) | +| Data Quality Assessment (optional) | You may check the accuracy and integrity of your imaging dataset | - [DICOM File integrity checker](https://bio.tools/dicom_file_integrity_checker_by_gibi230) | -
-

Preliminaries

- - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Get Familiar with EUCAIM - -
    -
  • Follow the EUCAIM training material and brief documents.
  • -
  • Browse architecture and
  • -
  • Watch webinars and videos.
  • -
-
- - https://dashboard.eucaim.cancerimage.eu
- - https://eucaim.gitbook.io/end-user-guide
- - https://www.youtube.com/@EUCAIM
- - https://training.eucaim.cancerimage.eu -
- Request a EUCAIM User - - Request a EUCAIM User in the Dashboard through LS-AAI. - - - Registration of users in EUCAIM -
+
+

Data Uploading

-
-

Data Preparation

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Extract Imaging and clinical data - - Use your own tools to extract the Medical Images and the clinical data - - N/A -
- Annotate the data (optional) - - Use your own annotation tool or the one selected by EUCAIM (MITK). Convert the annotations into DICOM-SEG. - - - MITK (Medical Imaging Interaction Toolkit) Workbench
- - DicomSeg converter -
- Data de-identification - - Ensure that no identifiable information is present in the dataset. If your imaging data are not already de-identified, you may use the Lethe DICOM Anonymizer- - - - Lethe DICOM Anonymizer -
- Re-identification risk assessment (optional) - - Assess the risk of re-identification of patients based on your imaging metadata by checking hidden DICOM Tags. - - - Wizard -
- Data Quality Assessment (optional) - - You may check the accuracy and integrity of your imaging dataset - - - DICOM File integrity checker -
+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Provide Data Ingester Account Details | Open a ticket in the helpdesk, select the "Reference nodes" group (or "Technical support team" if unavailable) and add a request with the title: "Create a data ingestion project in UPV" or "Create XNAT project in HealthRI" (depending on the Reference site), providing the name of the project, the username in EUCAIM who will manage it. An answer will be given soon. | - [https://help.cancerimage.eu](https://help.cancerimage.eu/) | +| Download and install the Data Ingestion tool | Download the Data Ingestion tool for the UPV node and the Clinical Trial Processor (CTP) for HealthRI | - [QP-Insights Uploader](https://bio.tools/qp-insights_uploader)
- [CTP](https://gitlab.com/radiology/infrastructure/data-curation-tools/ctp-standalone) | +| Request a user in the Reference node | Choose the reference node where the data will be uploaded (only one):
| - [Registration of users in UPV-eucaim-node](https://eucaim-node.i3m.upv.es/dataset-service/datasets?invalidated=false),
- [https://www.health-ri.nl/en/services/xnat](https://www.health-ri.nl/en/services/xnat) | +| Upload Imaging Data | Upload imaging data in the platform as described in the instructions (6.2.2 for UPV node and 6.2.3 for Health-RI). | - [User Guide for Data holders](https://eucaim.gitbook.io/enduserguide/6-userguide4members) | +| Upload clinical Data | Once medical imaging data is uploaded, you can proceed with the clinical data. If the process of converting the clinical data is expected to be long, we encourage you to create an “image-only” dataset by skipping this step. Use the same tool as before for UPV and XNATpy for Health-RI. Data can be in CSV or JSON. | - [QP-Insights Uploader](https://bio.tools/qp-insights_uploader)
- [XNATpy](https://xnat.readthedocs.io/en/latest/) | +
-
-

Data Uploading

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Provide Data Ingester Account Details - - Open a ticket in the helpdesk, select the "Reference nodes" group (or "Technical support team" if unavailable) and add a request with the title: "Create a data ingestion project in UPV" or "Create XNAT project in HealthRI" (depending on the Reference site), providing the name of the project, the username in EUCAIM who will manage it. An answer will be given soon. - - - https://help.cancerimage.eu -
- Download and install the Data Ingestion tool - - Download the Data Ingestion tool for the UPV node and the Clinical Trial Processor (CTP) for HealthRI - - - QP-Insights Uploader
- - CTP -
- Request a user in the Reference node - - Choose the reference node where the data will be uploaded (only one): -
    -
  • UPV (login button, register through LS-AAI and ask for a "Data Ingester" account)
  • -
-
- - Registration of users in UPV-eucaim-node, https://www.health-ri.nl/en/services/xnat - Health RI -
- Upload Imaging Data - - Upload imaging data in the platform as described in the instructions (6.2.2 for UPV node and 6.2.3 for Health-RI). - - - User Guide for Data holders, https://eucaim.gitbook.io/enduserguide/6-userguide4members -
- Upload clinical Data - - Once medical imaging data is uploaded, you can proceed with the clinical data. If the process of converting the clinical data is expected to be long, we encourage you to create an “image-only” dataset by skipping this step. Use the same tool as before for UPV and XNATpy for Health-RI. Data can be in CSV or JSON. - - - User Guide for Data holders, https://eucaim.gitbook.io/enduserguide/6-userguide4members
- - QP-Insights Uploader, https://bio.tools/qp-insights_uploader
- - XNATpy, https://xnat.readthedocs.io/en/latest/ -
+

Dataset Creation

+| Action | Description | Documents | +| ---------- | ---------- | ---------- | +| Create and Publish the Dataset | The dataset has to be created according to the instructions in the Gitbook (section 6.2.2.3 for UPV and 6.2.3 for Health-RI). | - [User Guide for Data holders](https://eucaim.gitbook.io/enduserguide/6-userguide4members) | +| Provide the dataset's metadata | Provide the metadata of the datasets according to the EUCAIM schema. In case of doubts with the terminology, use textual descriptions. | - [EUCAIM Dataset metadata](https://docs.google.com/spreadsheets/d/1cj6YzIAchHnEKlH612gO91WzHfEOB4TbwBrl9a0kgE0/edit?usp=sharing) or [Molgenis excel template](https://docs.google.com/spreadsheets/d/19DDoFq-_Bj7wfEf5KjkISe13kS-W5EYQ/edit?usp=sharing&ouid=102741390744373897413&rtpof=true&sd=true) | +| Make a request of registry upload | Create a helpdesk ticket on the category catalogue, providing the spreadsheet file with the metadata information. The helpdesk team will contact you back informing if the dataset has been properly registered or requesting more information. | - [https://help.cancerimage.eu/](https://help.cancerimage.eu/) | +| Verify the entries in the catalogue | Access the registry in the catalogue and verify the collection. | - [https://catalogue.eucaim.cancerimage.eu/#/collection/<\>](https://catalogue.eucaim.cancerimage.eu/#/collection/%3C%3Cidentifier%3E%3E) | -
-

Dataset Creation

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ActionDescriptionDocuments
- Create and Publish the Dataset - - The dataset has to be created according to the instructions in the Gitbook (section 6.2.2.3 for UPV and 6.2.3 for Health-RI). - - User Guide for Data holders -
- Provide the dataset's metadata - - Provide the metadata of the datasets according to the EUCAIM schema. In case of doubts with the terminology, use textual descriptions. - - EUCAIM Dataset metadata or Molgenis excel template -
- Make a request of registry upload - - Create a helpdesk ticket on the category catalogue, providing the spreadsheet file with the metadata information. The helpdesk team will contact you back informing if the dataset has been properly registered or requesting more information. - - https://help.cancerimage.eu -
- Verify the entries in the catalogue - - Access the registry in the catalogue and verify the collection. - - https://catalogue.eucaim.cancerimage.eu/#/collection/<<identifier>> -
diff --git a/Federated.md b/Federated.md index 4c221ce..d7bb137 100644 --- a/Federated.md +++ b/Federated.md @@ -1,11 +1,13 @@ -# 7\. Option 2: Setting up a Federated Node {#7.-option-2:-setting-up-a-federated-node} +# 7\. Option 2: Setting up a Federated Node -This section describes the requirements and process of setting up a node, including the security and privacy considerations and the expected Service Level Agreement. It describes the requirements and setps to achieve Tier 1 to Tier 3 compliance. Additionally, [section 7.5]({#7.5.-setting-up-a-local-node-with-mini-node}) describes the EUCAIM mini-node Software package, which provides an open-source solution for setting up a minimal node, capable of reachin Tier-2 compliance at the level of the services and Tier-3 compliance at the level of the data. Data Holders that have not set up their own node could find in this package a helpful software stack to deploy their own nodes. +This section describes the requirements and process of setting up a node, including the security and privacy considerations and the expected Service Level Agreement. It describes the requirements and setps to achieve Tier 1 to Tier 3 compliance. Additionally, [section 7.5](#id-7.5.-setting-up-a-local-node-with-mini-node) describes the EUCAIM mini-node Software package, which provides an open-source solution for setting up a minimal node, capable of reachin Tier-2 compliance at the level of the services and Tier-3 compliance at the level of the data. Data Holders that have not set up their own node could find in this package a helpful software stack to deploy their own nodes. -## 7.1. Setting up the node {#7.1.-setting-up-the-node} +## 7.1. Setting up the node -Data holders who opt to host the data locally must set up a local node capable of storing and processing the data extracted, anonymised and standardised. The requirements for the node depend on the amount of data to be processed. [Table 7](#tab_localnodespec2) and [Table 8](#tab_localnodespec3) show the minimum required expected for Tier 2 and Tier 3 node. +Data holders who opt to host the data locally must set up a local node capable of storing and processing the data extracted, anonymised and standardised. The requirements for the node depend on the amount of data to be processed. [Table 7](#tab_localnodespectwo) and [Table 8](#tab_localnodespecthree) show the minimum required expected for Tier 2 and Tier 3 node. + +### | Hardware | Minimum | | :---- | :---- | @@ -14,7 +16,9 @@ Data holders who opt to host the data locally must set up a local node capable o | Operating System Drive | 160+ GB SSD | | Data Storage | 1x (Dataset size) Drives | -[Table 7](#table_localnodespec2): *Minimum hardware requirements for Tier 2 nodes.* +[Table 7](#tab_localnodespectwo): *Minimum hardware requirements for Tier 2 nodes.* + +### | Hardware | Minimum/Recommended | | :---- | :---- | @@ -24,27 +28,27 @@ Data holders who opt to host the data locally must set up a local node capable o | GPU | Minimum: \>150 Tensor Cores 16GB VRAM | | Motherboard | 4+ RAM Slot | -[Table 8](#table_localnodespec3)*: Minimum hardware requirements for Tier 3 nodes* +[Table 8](#tab_localnodespecthree): *Minimum hardware requirements for Tier 3 nodes* -A detailed description on the needs and components required for the local node can be found in [D5.6](https://drive.google.com/file/d/1URY8jtofLQpokTh7Hzag2wFFV9r1d_fs/view?usp=sharing%20), section 3.7*.* +A detailed description on the needs and components required for the local node can be found in [D5.6](https://drive.google.com/file/d/1URY8jtofLQpokTh7Hzag2wFFV9r1d_fs/view?usp=sharing%20), section 3.7. The processing capacity should be dimensioned to the amount of data available and it should be audited periodically. -### **7.1.1. Security and privacy considerations** {#7.1.1.-security-and-privacy-considerations} +### **7.1.1. Security and privacy considerations** As stated in section 3.2, The DH who set up a local node have to demonstrate that the site implements good practices related to security and privacy preservation. Although they are not mandatory, a certification such as ISO/IEC 27001 and/or ISO/IEC 27701 would be appropriate to prove this capability. As a reference, the UPV reference node has a security and privacy document which is shared with the Data Holders in this link: [https://drive.google.com/file/d/1QK9pBuSwyMXNUdjIrNcd5khZb_Wpzmag/view?usp=drive_link](https://drive.google.com/file/d/1QK9pBuSwyMXNUdjIrNcd5khZb_Wpzmag/view?usp=drive_link). Ensure that the document includes the responsible persons and contacts in charge of the management of the site, the monitoring, backuping and security incidents. The site should implement periodic security audits. External security audits are also encouraged. -### **7.1.2. Service Level Agreement** {#7.1.2.-service-level-agreement} +### **7.1.2. Service Level Agreement** The federated nodes have to guarantee that they commit enough resources to deal with the necessary level of service. This should be committed by signing a Service Level Agreement that declares the resources committed to the infrastructure, the Service support conditions, the committed Availability and Reliability and the contact points. As a basis, the UPV reference node has the Service Level Agreement publicly available in [https://eucaim-node.i3m.upv.es/dataset-service/web/sla.pdf](https://eucaim-node.i3m.upv.es/dataset-service/web/sla.pdf). -## 7.2. Tier 1 compliance {#7.2.-tier-1-compliance} +## 7.2. Tier 1 compliance The compliance at the Tier 1 level implies that the metadata of the datasets follow the EUCAIM DCAT-AP specification. In this case, the data holder can decide to register the datasets directly on the EUCAIM public catalogue or to set up its own federated registry. At this moment in time, we recommend the former, as the harvester will be released soon. -The registration of the dataset on the public catalogue has been described in section 5.1 / Table 4 of this document. The set up of a local catalogue is optional and it is described in [figure 10](#fig_tier1fednode) and [table 9](#tab_tier1fednode), and comprise the following actions: +The registration of the dataset on the public catalogue has been described in section 5.1 / Table 4 of this document. The set up of a local catalogue is optional and it is described in [figure 11](#fig_tier1fednode) and [table 9](#tab_tier1fednode), and comprise the following actions: - Dataset metadata preparation. This implies identifying the data to be shared and packaged into a dataset, the extraction of the metadata and the appropriate coding into the EUCAIM DCAT-AP terminology and vocabularies. This has been covered in section 5 of this document. @@ -52,28 +56,32 @@ The registration of the dataset on the public catalogue has been described in se - In the coming future, we will support the federation of datasets through a pull model in which datasets’ metadata is harvested by the central catalogue. This will require deploying a local registry and populating it with the information of the DH’s datasets. -Workflow for the tier 1 compliance in a Federated node +### -Figure 10: Workflow for the tier 1 compliance in a Federated node. +

Figure 11: Workflow for the tier 1 compliance in a Federated node.

+ +### | Action | Purpose | Link | | :---- | :---- | :---- | | Set up of the local catalogue | Deployment of a local instance of the catalogue to populate it with the information of the datasets provided by the Data Holder | [Gitlab repository](https://gitlab.com/radiology/infrastructure/studies/eucaim/molgenis-emx2-eucaim) | -| Population of the data | Data should follow the EUCAIM interoperability schema. A sample file can be used to fill-in the information of the datasets and to create the schemas on the database. Detailed information is provided in [https://github.com/EUCAIM/End-User-Guide/blob/main/6-UserGuide4Members.md\#631-tier-1-compliance](https://github.com/EUCAIM/End-User-Guide/blob/main/6-UserGuide4Members.md#631-tier-1-compliance) | [Sample file with the schema](https://docs.google.com/spreadsheets/d/19DDoFq-_Bj7wfEf5KjkISe13kS-W5EYQ/edit?usp=sharing&ouid=102741390744373897413&rtpof=true&sd=true). | +| Population of the data | Data should follow the EUCAIM interoperability schema. A sample file can be used to fill-in the information of the datasets and to create the schemas on the database. Detailed information is provided in [https://eucaim.gitbook.io/enduserguide/](https://eucaim.gitbook.io/enduserguide/6-userguide4members#id-6.3.2.-tier-1-compliance) | [Sample file with the schema](https://docs.google.com/spreadsheets/d/19DDoFq-_Bj7wfEf5KjkISe13kS-W5EYQ/edit?usp=sharing&ouid=102741390744373897413&rtpof=true&sd=true). | | Federation of the catalogue **(in progress)** | Enable automatic synchronisation of the local catalogue with the central one. | In progress | -[Table 9](#table_tier1fednode): Set up a federated Catalogue +[Table 9](#tab_tier1fednode): Set up a federated Catalogue -## 7.3. Tier 2 compliance {#7.3.-tier-2-compliance} +## 7.3. Tier 2 compliance The Tier 2 compliance implies that the data that is hosted at the federated node can be searched according to the searching variables defined in the CDM. At this point it is assumed that: - The Data Holder has set up a repository with the imaging and clinical data, in accordance with the data requirements outlined in Section 5.2 - The repository has a searching endpoint that can be accessed to retrieve the number of subjects and studies that fulfil a specific filtering criteria. -The steps needed to integrate the local node are described in [figure 11](#fig_tier2fednode). +The steps needed to integrate the local node are described in [figure 12](#fig_tier2fednode). + +### -![Figure_11](figures/image11.png)Figure 11: Actions to integrate a federated node in the tier 2 level. +

Figure 12: Actions to integrate a federated node in the tier 2 level.

The actions corresponding to the federated search are described in the gitbook ([https://eucaim.gitbook.io/enduserguide/6-userguide4members\#id-6.3.-contribution-through-a-federated-node](https://eucaim.gitbook.io/enduserguide/6-userguide4members#id-6.3.-contribution-through-a-federated-node)) . @@ -82,19 +90,19 @@ The steps that need to be developed are the following: | \# | Action | Documentation / Links | | :---- | :---- | :---- | | 1 | Metadata mapping | A mapping of the searchable items described in Tables 14 and 15 in D5.6 to the local variables should be defined. If the data is already transformed to the EUCAIM CDM (see Section 5.2), then this step is not required. | -| 2a | Mediator component deployment (recommended) | The deployment of the SQL-based mediator component can be done as a Docker container. Section 7.3.1 Describes the process (if you are using a [mini node](#7.5.-setting-up-a-local-node-with-mini-node), this will be performed automatically). | +| 2a | Mediator component deployment (recommended) | The deployment of the SQL-based mediator component can be done as a Docker container. Section 7.3.1 Describes the process (if you are using a [mini node](#75-setting-up-a-local-node-with-mini-node), this will be performed automatically). | | 2b | Mediator component development (optional) | If you are not exposing the data following the FHIR Standard or as a CDM-compliant PostresSQL (which is the result ), you should develop your own component to adapt the queries. An example of such component can be found in D5.6 “ Section 5.2.1 *Dataset in a Federated Node*, subsection “Guidelines for creating a mapping component”. | | 3 | Request registration in the explorer | Once the components are deployed, a ticket in the helpdesk, under the category “federated search” should be created with the request “register a new federated search provider”. | Once you have the component developed and deployed, the integration with the federated search central services requires several steps, which are detailed next. -### **7.3.1. Node Registration and Deployment** {#7.3.1.-node-registration-and-deployment} +### **7.3.1. Node Registration and Deployment** After submitting and having your registration request accepted, perform the following steps: **1\. Generate and Submit a CSR** -Create a Certificate Signing Request (CSR) with the Common Name (CN) set to your provider’s ID (the provider ID or your_id is an identifier for your organization chosen by you and accepted by the validator) plus the domain [broker.eucaim.cancerimage.eu](http://broker.eucaim.cancerimage.eu): +Create a Certificate Signing Request (CSR) with the Common Name (CN) set to your provider’s ID (the provider ID or your_id is an identifier for your organization chosen by you and accepted by the validator) plus the domain `broker.eucaim.cancerimage.eu`: ``` openssl req -key $REPO_ID.priv.pem -new \ @@ -102,9 +110,9 @@ openssl req -key $REPO_ID.priv.pem -new \ -out $REPO_ID.csr ``` -- $PROVIDER_ID.priv.pem: Name of the private key file to be generated. -- CN: Should be {your_id}.broker.eucaim.cancerimage.eu. The value of {your_id} should have been provided as a reply to the registration. -- C=, L=: Country and locality codes as needed. +- `$PROVIDER_ID.priv.pem`: Name of the private key file to be generated. +- `CN`: Should be `{your_id}.broker.eucaim.cancerimage.eu.` The value of `{your_id}` should have been provided as a reply to the registration. +- `C=`, `L=`: Country and locality codes as needed. Then, submit the resulting `.csr` file to the central node managers through the helpdesk, as a reply to the opened ticket. @@ -175,7 +183,7 @@ Once you have your metadata mapping, your Mediator component operational, the Ro All communications are performed using encrypted protocols (TLS 1.3). -## 7.4. Tier 3 compliance {#7.4.-tier-3-compliance} +## 7.4. Tier 3 compliance The following is the usual “step-by-step” procedure to deploy FEM-client, the component responsible for connecting a node to the EUCAIM’s federated network. @@ -201,7 +209,7 @@ The following is the usual “step-by-step” procedure to deploy FEM-client, th 4. **Final Setup & Testing** * After setup, we’ll run some tests to verify: 1\) Network connectivity; 2\) FEM-client’s ability to access local infrastructure and trigger container executions; and 3\) materialization of data for EUCAIM. -## 7.5. Setting up a local node with Mini-node {#7.5.-setting-up-a-local-node-with-mini-node} +## 7.5. Setting up a local node with Mini-node Data holders that do not have a local node could easily deploy a minimal node capable of providing access to data to data users and link to the EUCAIM federation by means of the EUCAIM mini node ([https://github.com/EUCAIM/mini-node](https://github.com/EUCAIM/mini-node)). The mini node currently features: @@ -213,7 +221,7 @@ Data holders that do not have a local node could easily deploy a minimal node ca The mini node will be extended with the capability of running batch jobs and the materialisator component to integrate with the Federated Processing. -### 7.5.1. Requirements {#7.5.1-requirements} +### 7.5.1. Requirements Mini node works on top of a Kubernetes cluster and users scripts in Python. If the expected workload is limited (in the order o5 5 concurrent users as a maxium), the whole node can be setup in a single computer, following the Tier 2/3 hardware requirements described at the beginning of the section. Linux is preferrable, but the setup of Kubernetes provides a virtualization layer that could overcome this requirement. With respect to the Kubernetes release, despite that the mini node manifests could work with any compatible distribution, we encourage the usage of (minikube)[https://minikube.sigs.k8s.io/docs/]. The installation of minikube is well described in the documentation available in the previous link. @@ -227,7 +235,7 @@ Additionally, the host computer must have: The ([https://github.com/EUCAIM/mini-node](https://github.com/EUCAIM/mini-node)) repository contains the scripts and configuration files to automate the deployment of a mini EUCAIM node using Kubernetes and Minikube. It includes automated installation for Keycloak, Guacamole, and the Dataset Service, with all secrets and configuration injected from a single YAML file. -### 7.5.2. Installing the prerequisites {#7.5.2-installing-the-prerequisites} +### 7.5.2. Installing the prerequisites The following tools must be installed before running the installation: @@ -296,7 +304,7 @@ alias 'helm=minikube helm --` Newer versions of minikube do not support `helm` as an internal command. In this case, `helm` can be installed directly in the computer (see https://helm.sh/docs/intro/install/ ). -### 7.5.3. Installing the software {#7.5.3-installing-the-software} +### 7.5.3. Installing the software The installation script (`install.py`) automates the deployment of: - Keycloak (authentication/authorization) - Dataset Service diff --git a/Governance.md b/Governance.md index fc97a2c..c43b8f9 100644 --- a/Governance.md +++ b/Governance.md @@ -1,6 +1,6 @@ -# 2\. Governance {#2.-governance} +# 2. Governance -## 2.1. Data governance {#2.1.-data-governance} +## 2.1. Data governance Data provision in EUCAIM follows a structured process led by its operational boards, each playing a key role in ensuring that incoming datasets meet the platform’s scientific, technical and legal standards. @@ -8,14 +8,14 @@ When a Data Holder submits an application to join the federation using the **Exp Once all evaluations are completed, the Access Committee prepares a consolidated report which is sent to the **Management Board** and **Steering Committee** to make the final decision. Throughout the process, Data Holders are expected to collaborate closely with the involved boards, provide documentation, and requests for clarification. [Figure 1](#fig_dataprov) shows a graphical representation of this process. -Data provision workflow +### -Figure 1: Data provision workflow. +

Figure 1: Data provision workflow.

Once the data is registered and available through the EUCAIM Platform, the access for the Data Users will be submitted through the negotiator component and will be subject to the evaluation of the Access Committee. The AC evaluates the applications and informs the Management board and the DH, when needed. Federated DHs will be involved in the negotiation process for the agreement on the data access conditions. [Figure 2](#fig_dureq) shows a graphical schema of the process. -Data access request workflow. +### -Figure 2: Data access request workflow. +![Figure 2: Data access request workflow.](figures/image2.png) The Data Holders must provide a contact point, in case of a federated node, and should endorse the EUCAIM AC to request the signature of the access conditions in the case of transferring the data to a reference node. This is explained in more detail in the next section. diff --git a/Introduction.md b/Introduction.md index 5013be4..4b5d93e 100644 --- a/Introduction.md +++ b/Introduction.md @@ -1,4 +1,4 @@ -# 1\. Introduction {#1.-introduction} +# 1. Introduction This handbook is designed to guide **Data Holders** through the onboarding process for sharing or transferring data to the EUCAIM infrastructure. It outlines the roles, responsibilities, legal and technical requirements, and procedural steps to ensure compliance and facilitate smooth integration into the EUCAIM Federation. diff --git a/Onboarding.md b/Onboarding.md index e1a6fac..f7bdf0b 100644 --- a/Onboarding.md +++ b/Onboarding.md @@ -1,8 +1,8 @@ -# 3\. Onboarding Process {#3.-onboarding-process} +# 3. Onboarding Process EUCAIM defines a federated infrastructure in which nodes provide with data and services[^3]. -## 3.1. Initial requirements and commitments {#3.1.-initial-requirements-and-commitments} +## 3.1. Initial requirements and commitments **Before you start, pre-onboarding workflow**: @@ -29,21 +29,23 @@ EUCAIM defines a federated infrastructure in which nodes provide with data and s - GDPR-compliant documentation to be reviewed and approved by the institutional ethics committee. - DTA/DSA signature + other documentation, please go to section 3.2 Legal Documents of this Handbook. - Technical requirements: [Technical_requirements_Data_Holders](https://docs.google.com/document/d/1u0IPiPNcPivfECYzVvU6zXzh77jNrLojPeHIdLPjEhc/edit?usp=sharing) -4. Imaging and data preparation according to the EUCAIM [Common Data Model](https://eucaim-cdm.ics.forth.gr/) and [Hyperontology](https://eucaim-cdm.ics.forth.gr/). +4. Imaging and data preparation according to the EUCAIM [Common Data Model](https://eucaim-cdm.ics.forth.gr/) and [Hyperontology](https://hyperontology.eucaim.cancerimage.eu/). 5. Participation in monitoring, validation and quality assurance activities. Each step is supported by __tools, documentation, and expert teams__ from EUCAIM, ensuring Data Holders receive technical, legal, and procedural guidance throughout the process. -## 3.2. Legal documents. {#3.2.-legal-documents.} +## 3.2. Legal documents This section summarises the legal documentation that is required to become a Data Holder in EUCAIM. This information is much more detailed (and potentially more up-to-date) in the Legal Handbook of the project, available in this [link](https://docs.google.com/document/d/1U-RpFycjXEVP-4-l9ppveT654x78Dhlw/edit). We recommend going through the Legal Handbook when requesting and preparing the information and use the information below as a general guidance. A set of legal agreements must be prepared and signed to clearly state the obligations and responsibilities of the parties involved. The process is simpler in the case of Data Transfer Data Holders, as documents related to security and Service Level Agreements are provided by the reference nodes where the data will be deposited. Federated nodes have to provide a guarantee that they can fulfill the security and performance requirements[^5]. [Figure 3](#fig_legaldiagram) graphically shows the information and steps required for the legal framework of EUCAIM. -![Figure 3: Information and steps required to complete the legal framework of EUCAIM.](figures/image3.png) Figure 3: Information and steps required to complete the legal framework of EUCAIM. +### + +![Figure 3: Information and steps required to complete the legal framework of EUCAIM.](figures/image3.png) It is essential that the data holder provides a contact person of its legal team to be in close communication with the legal team of EUCAIM. A contact point will be assigned during the onboarding process. -The first step will be to Complete the ethical training via the Moodle platform ([https://training.eucaim.cancerimage.eu/](https://training.eucaim.cancerimage.eu/))[^6]. Then, the ethical and legal requirements for data holders are different depending on the collaboration model chosen: +The first step will be to Complete the ethical training[^6] via the Moodle platform ([https://training.eucaim.cancerimage.eu/](https://training.eucaim.cancerimage.eu/)). Then, the ethical and legal requirements for data holders are different depending on the collaboration model chosen: * **Data holders who agree to transfer data to a reference node**: @@ -84,27 +86,29 @@ The first step will be to Complete the ethical training via the Moodle platform In both cases it is compulsory that the DPO and/or the legal representative of the Data Holder confirm that they are aware about the transfer or sharing the data within EUCAIM and the security measures that must be taken. -[Table 1](#tab_DTA-1) summarises the actions for the Data Holders opting for the Data Transfer model and [Table 2](#tab_DSA-1) for the Data Holders who will set up a federated node. +[Table 1](#tab_dta1) summarises the actions for the Data Holders opting for the Data Transfer model and [Table 2](#tab_dsa1) for the Data Holders who will set up a federated node. + +### | Data Transfer | | | | :---- | :---- | :---- | | **Action** | **Description** | **Documents** | -| Provide documentation | - Proof of legal representative, and legal basis if necessary.
- A copy of a favorable ethical approval (if applicable).
- A report from the DPO confirming legal compliance.
- Security compliance.
- GDPR compliance.
- Data Protection Impact Assessment (if applicable).
- Any documents required under the national legislation.
- Evidence of an adequate anonymization/pseudonymization process that has been carried out | For more information see primarily the [Legal Handbook](https://docs.google.com/document/d/1U-RpFycjXEVP-4-l9ppveT654x78Dhlw/edit?tab=t.0), [D4.4 Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy) (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders) -Find also here the template for the DPO report: [faq_DPO_template.docx - Google Docs](https://docs.google.com/document/d/1KHf1nlCxFB1BjhhQXHVo4zVSoOBorL_X/edit) | -| Data Transfer Agreement | Fill-in and sign the DTA | [DTA](https://drive.google.com/file/d/1TTuaFo4cWwomLJBtQbs_lkrBNFVSLH_L/view?usp=drive_link) | -[Table 1](#table_DTA-1): Summary of steps to be completed for Data Transfer case. +| Provide documentation | - Proof of legal representative, and legal basis if necessary.
- A copy of a favorable ethical approval (if applicable).
- A report from the DPO confirming legal compliance.
- Security compliance.
- GDPR compliance.
- Data Protection Impact Assessment (if applicable).
- Any documents required under the national legislation.
- Evidence of an adequate anonymization/pseudonymization process that has been carried out | For more information see primarily the [Legal Handbook](https://docs.google.com/document/d/1U-RpFycjXEVP-4-l9ppveT654x78Dhlw/edit?tab=t.0), [D4.4 Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy) (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders))
- Find also here the template for the DPO report: [faq_DPO_template.docx - Google Docs](https://docs.google.com/document/d/1KHf1nlCxFB1BjhhQXHVo4zVSoOBorL_X/edit) | +| Data Transfer Agreement | Fill-in and sign the DTA once all the legal documentation have been provided | [DTA](https://drive.google.com/file/d/1TTuaFo4cWwomLJBtQbs_lkrBNFVSLH_L/view?usp=drive_link) | + +[Table 1](#tab_dta1): Summary of steps to be completed for Data Transfer case. + +### | Data Sharing | | | | :---- | :---- | :---- | | **Action** | **Description** | **Documents** | -| Provide documentation | - Proof of legal representative, and legal basis if necessary.
- A copy of a favourable ethical approval (if applicable).
- A report from the DPO confirming legal compliance.
- GDPR compliance.
- Data Protection Impact Assessment (if applicable).
- Evidence of an adequate anonymization/pseudonymization process that has been carried out.
- Documents demonstrating the security of the information system.
- Any documents required under your national legislation.
| For more information see primarily the [Legal Handbook](https://docs.google.com/document/d/1U-RpFycjXEVP-4-l9ppveT654x78Dhlw/edit?tab=t.0), [D4.4 Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy) (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders) -Find also here the template for the DPO report: [faq_DPO_template.docx - Google Docs](https://docs.google.com/document/d/1KHf1nlCxFB1BjhhQXHVo4zVSoOBorL_X/edit) | +| Provide documentation | - Proof of legal representative, and legal basis if necessary.
- A copy of a favourable ethical approval (if applicable).
- A report from the DPO confirming legal compliance.
- GDPR compliance.
- Data Protection Impact Assessment (if applicable).
- Evidence of an adequate anonymization/pseudonymization process that has been carried out.
- Documents demonstrating the security of the information system.
- Any documents required under your national legislation.
| For more information see primarily the [Legal Handbook](https://docs.google.com/document/d/1U-RpFycjXEVP-4-l9ppveT654x78Dhlw/edit?tab=t.0), [D4.4 Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy) (See Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders))
- Find also here the template for the DPO report: [faq_DPO_template.docx - Google Docs](https://docs.google.com/document/d/1KHf1nlCxFB1BjhhQXHVo4zVSoOBorL_X/edit) | | Data Sharing Agreement | Fill-in and sign the DSA | [DSA](https://drive.google.com/file/d/1-UyQ02w0-shmRgQgp8L1ATWs1JEco3_Y/view?usp=drive_link) | | Define especial Access Conditions | A Document to be signed by the Data User that indicates the conditions under the Data User can access the data. | [Draft Template](https://drive.google.com/file/d/1UMdDF52mXGHNIL0GegzfyuSBVfKCIl7d/view?usp=sharing) | | Contact point for the negotiation (Only in federated nodes) | The LS-AAI details of the data holder delegate who will interact with the Data User through the negotiator. | [Registration of users in EUCAIM](https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view) LS-AAI. | -[Table 2](#table_DSA-1): Summary of steps to be completed for Data Sharing case - +[Table 2](#tab_dsa1): Summary of steps to be completed for Data Sharing case [^3]: *See [D5.6 Minimum Data Federation and Interoperability Framework](https://drive.google.com/file/d/1URY8jtofLQpokTh7Hzag2wFFV9r1d_fs/view?usp=sharing)* *section 3 and [https://eucaim.gitbook.io/architecture-of-eucaim/4.-detailed-architecture](https://eucaim.gitbook.io/architecture-of-eucaim/4.-detailed-architecture)* @@ -115,4 +119,4 @@ Find also here the template for the DPO report: [faq_DPO_template.docx - Google [^6]: *See D2.4 [Training Evaluation: Guidelines, Best Practices, Lessons Learned](https://drive.google.com/file/d/1hNCkrP8UutNiPexzAzpsdt3WDOwdVh66/view?usp=drive_link).* -[^7]: See D4.4 [Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy), Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders). +[^7]: *See D4.4 [Final rules for participation report](https://drive.google.com/drive/folders/1dn1xQB9K7Fn3WzzqN5HRiQ7NiVwYt0yy), Sections 4.4.1 (Legal requirements) and 4.4.2 (Ethical requirements for Data Holders)*. diff --git a/README.md b/README.md index 746cc1c..d3f4b91 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ **Project title:** European Federation for Cancer Images -![image](figures/image0.png) +
**Disclaimer** @@ -24,56 +24,55 @@ Table of contents -[**1\. Introduction**](https://eucaim.gitbook.io/handbook/introduction){:target="_blank"} +[**1\. Introduction**](Introduction.md) - -[**2\. Governance**](https://eucaim.gitbook.io/handbook/governance) +[**2\. Governance**](Governance.md) -    [2.1. Data governance](Governance.md#data-governance) +    [2.1. Data governance](Governance.md#id-2.1.-data-governance) -[**3\. Onboarding Process**](https://eucaim.gitbook.io/handbook/onboarding) +[**3\. Onboarding Process**](Onboarding.md) -    [3.1. Initial requirements and commitments](Onboarding.md#initial-requirements-and-commitments) +    [3.1. Initial requirements and commitments](Onboarding.md#id-3.1.-initial-requirements-and-commitments) -    [3.2. Legal documents.](Onboarding.md#legal-documents.) +    [3.2. Legal documents](Onboarding.md#id-3.2.-legal-documents) [**4\. Support and communication**](Support.md) -    [4.1. Engagement Team](Support.md#engagement-team) +    [4.1. Engagement Team](Support.md#id-4.1.-engagement-team) -    [4.2. Helpdesk](Support.md#helpdesk) +    [4.2. Helpdesk](Support.md#id-4.2.-helpdesk) -    [4.3. EUCAIM training platform: Overview of courses and access](Support.md#eucaim-training-platform) +    [4.3. EUCAIM training platform: Overview of courses and access](Support.md#id-4.3.-eucaim-training-platform-overview-of-courses-and-access) [**5\. Data Preparation process**](DataPreparation.md) -    [5.1. Data Preparation tools](DataPreparation.md#data-preparation-tools) +    [5.1. Data Preparation tools](DataPreparation.md#id-5.1.-data-preparation-and-related-tools-from-the-eucaim-catalogue) -    [5.2. Tier 1 datasets](DataPreparation.md#tier-1-datasets) +    [5.2. Tier 1 datasets](DataPreparation.md#id-5.2.-tier-1-datasets) -    [5.3. Tier 2 and 3 datasets](DataPreparation.md#tier-2-and-3-datasets) +    [5.3. Tier 2 and 3 datasets](DataPreparation.md#id-5.3.-tiers-2-and-3-datasets) [**6\. Option 1: Transfer to Reference Node**](Transfer.md) -    [6.1. Reference Nodes](Transfer.md#reference-nodes) +    [6.1. Reference Nodes](Transfer.md#id-6.1.-reference-nodes) -    [6.2. Transferring data to the nodes](Transfer.md#transferring-data-to-the-nodes) +    [6.2. Transferring data to the nodes](Transfer.md#id-6.2.-transferring-data-to-the-nodes) [**7\. Option 2: Setting up a Federated Node**](Federated.md) -    [7.1. Setting up the node](Federated.md#setting-up-the-node) +    [7.1. Setting up the node](Federated.md#id-7.1.-setting-up-the-node) -        [7.1.1. Security and privacy considerations](Federated.md#security-and-privacy-considerations) +        [7.1.1. Security and privacy considerations](Federated.md#id-7.1.1.-security-and-privacy-considerations) -        [7.1.2. Service Level Agreement](Federated.md#service-level-agreement) +        [7.1.2. Service Level Agreement](Federated.md#id-7.1.2.-service-level-agreement) -    [7.2. Tier 1 compliance](Federated.md#tier-1-compliance) +    [7.2. Tier 1 compliance](Federated.md#id-7.2.-tier-1-compliance) -    [7.3. Tier 2 compliance](Federated.md#tier-2-compliance) +    [7.3. Tier 2 compliance](Federated.md#id-7.3.-tier-2-compliance) -        [7.3.1. Node Registration and Deployment](Federated.md#7.3.1.-node-registration-and-deployment) +        [7.3.1. Node Registration and Deployment](Federated.md#id-7.3.1.-node-registration-and-deployment) -    [7.4. Tier 3 compliance](Federated.md#7.4.-tier-3-compliance) +    [7.4. Tier 3 compliance](Federated.md#id-7.4.-tier-3-compliance) [**8\. References**](References.md) diff --git a/References.md b/References.md index 00482a3..1a03853 100644 --- a/References.md +++ b/References.md @@ -11,7 +11,7 @@ \[5\] Registration of users in EUCAIM LS-AAI. [https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view](https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view) -\[6\] EUCAIM Dashboard Manual. [https://eucaim.gitbook.io/eucaim-dashboard](https://eucaim.gitbook.io/eucaim-dashboard) +\[6\] EUCAIM Dashboard Page. [https://dashboard.eucaim.cancerimage.eu/documentation](https://dashboard.eucaim.cancerimage.eu/documentation) \[7\] End user Guide of the Platform Services for the different profiles [https://eucaim.gitbook.io/end-user-guide](https://eucaim.gitbook.io/end-user-guide) diff --git a/Support.md b/Support.md index 50d9560..4c13a0c 100644 --- a/Support.md +++ b/Support.md @@ -1,13 +1,13 @@ -# 4\. Support and communication {#4.-support-and-communication} +# 4. Support and communication -## 4.1. Engagement Team {#4.1.-engagement-team} +## 4.1. Engagement Team As the first line of support for Data Holders, the **Engagement Team** will reach out to them to start the Onboarding process to the EUCAIM Federation. - The __Engagement Team Coordinator__ will focus on coordinating the relationships with the Data Holders by providing them with the information and resources that will guide them through each step of the onboarding process as well as collecting information from their centre’s current situation. - Each DH’s DPO will be contacted by the __Engagement Team’s legal support team__ directly to address the legal requirements. DH’s will also be assigned with one of the __Engagement Team’s technicians__ that will support them in any technical matter, but this will be mainly addressed also via the - Helpdesk which will be explained as follows. -## 4.2. Helpdesk {#4.2.-helpdesk} +## 4.2. Helpdesk The information on the usage of the platform is available in different sources: @@ -21,7 +21,10 @@ The information on the usage of the platform is available in different sources: - The HelpDesk, for requesting support when facing issues or when direct interaction is required (e.g. for requesting the uploading data, registering a dataset, etc.). -![Figure 4](figures/image4.png) Figure 4: Sources of information for Data Holders +### +![Figure 4: Sources of information for Data Holders.](figures/image4.png) + +### | Support source | Purpose | Link (s) | | :---- | :---- | :---- | @@ -43,7 +46,8 @@ You may contact the Helpdesk via two paths: The issue will be addressed within 48 hours, and the data holder will receive an answer by email as well as in the helpdesk interface. -### **4.3. EUCAIM training platform: Overview of courses and access** {#4.1.1.-eucaim-training-platform:-overview-of-courses-and-access} +## 4.3. EUCAIM training platform: Overview of courses and access + +### +![Figure 5: Schema of the training modules.](figures/image5.png) -![Figure 5](figures/image5.png) -Figure 5: Schema of the training modules diff --git a/Transfer.md b/Transfer.md index 4cf8cf6..bb54775 100644 --- a/Transfer.md +++ b/Transfer.md @@ -1,6 +1,6 @@ -# 6\. Option 1: Transfer to Reference Node {#6.-option-1:-transfer-to-reference-node} +# 6\. Option 1: Transfer to Reference Node -## 6.1. Reference Nodes {#6.1.-reference-nodes} +## 6.1. Reference Nodes EUCAIM has set up two reference nodes to host data transferred from the data holders. These two reference nodes are complementary and use compatible but different technologies. @@ -8,25 +8,34 @@ EUCAIM has set up two reference nodes to host data transferred from the data hol - The Euro-BioImaging Medical Imaging Repository ([https://xnat.health-ri.nl](https://xnat.health-ri.nl)) is a platform operated by Health-RI ([https://www.health-ri.nl/en/services/xnat](https://www.health-ri.nl/en/services/xnat)) for storing and managing imaging provided as a service through the Euro-BioImaging ERIC. XNAT is an extensible open-source imaging platform that simplifies common tasks in imaging data management. The Imaging Data should be stored in DICOM format if that is available, but can be also stored in other formats like NIfTI, and derived data and clinical data can also be stored in appropriate file formats (CSV or JSON). -Details on the features supported by each Reference node are provided in [D5.6, section 4.3.3](https://cancerimage.eu/wp-content/uploads/2026/02/D5.6.-Minimum-Data-Federation-and-Interoperability-Framework.pdf). All communications are performed using encrypted protocols (TLS 1.3). +Details on the features supported by each Reference node are provided in [D5.6](https://cancerimage.eu/wp-content/uploads/2026/02/D5.6.-Minimum-Data-Federation-and-Interoperability-Framework.pdf), section 4.3.3. All communications are performed using encrypted protocols (TLS 1.3). -## 6.2. Transferring data to the nodes {#6.2.-transferring-data-to-the-nodes} +## 6.2. Transferring data to the nodes -The workflow for uploading the datasets is described in [figure 9](#fig_dataing1). [Table 5](#tab_6REFUPV) and [Table 6](#tab_6REFHRI) describe the individual steps required to fulfil the process for each one of the reference nodes. +The workflow for uploading the datasets is described in [Figure 10](#fig_dataing1). [Table 5](#tab_refupv) and [Table 6](#tab_refhri) describe the individual steps required to fulfil the process for each one of the reference nodes. -![Figure 9](figures/image9.png) -Figure9: Steps in the process of transferring data to the nodes. Steps in purple are covered in section 5\. Steps in blue (5-10) are described in tables [Table 5](#tab_6REFUPV) and [Table 6](#tab_6REFHRI). +### + +![Figure 10. Steps in the process of transferring data to the nodes. Steps in purple are covered in section 5\. Steps in blue (5-10) are described in tables Table 5 and Table 6 .](figures/image9.png) + +### | \# | Action | Documentation / Links | -| :---- | :---- | :---- | +| :---- | :---- | :--------------- | | 5 | Request a EUCAIM User | [Registration of users in EUCAIM](https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view) [Registration of users in eucaim-node](https://eucaim-node.i3m.upv.es/dataset-service/datasets?invalidated=false), login button, register through LS-AAI and ask for a “Data Ingester” account. | | 6 | Provide Data Ingester Account Details | Open a ticket in [https://help.cancerimage.eu](https://help.cancerimage.eu), select the “Reference nodes” group (or “Technical support team” if unavailable) and add a request with the title: “Create a data ingestion imaging biobank” and providing a name for the biobank, the username in EUCAIM who will manage it (step \#4) and an URL if available. An answer will be given soon. | | 7 | Download and install the Data Ingestion tool | The data ingestion tool is a Python web applications. In case of trouble, you can request an issue in the helpdesk in the same category as above. The details for downloading and installing the tool are available in [https://bio.tools/data\_ingestion\_tool\_upv\_reference\_node](https://bio.tools/data_ingestion_tool_upv_reference_node) . | -| 8 | Upload Imaging Data | The instructions for using the tool are available in the User Guide for Data Holders ([https://eucaim.gitbook.io/end-user-guide/user-guide-for-data-holders](https://eucaim.gitbook.io/end-user-guide/user-guide-for-data-holders)). Proceed initially uploading medical imaging data. | -| 9 | Upload clinical Data | Once medical imaging data is uploaded, you can proceed with the clinical data. If you decide to convert the data through the [ETL](https://www.google.com/url?q=https://bio.tools/eetl_toolset&sa=D&source=docs&ust=1746522946004469&usg=AOvVaw03nEW7Q5sOpFaDNIZA6ogQ) inside the node and this process is expected to be long, we encourage you to create an “image-only” dataset by skipping this step. | -| 10 | Create and Publish the Dataset | The creation of a dataset is described in section 6.2.2.3 of the the User Guide for Data Holders ([https://eucaim.gitbook.io/end-user-guide/user-guide-for-data-holders](https://eucaim.gitbook.io/end-user-guide/user-guide-for-data-holders)). Once the dataset is created, additional metadata of the dataset can be added and the dataset published as described in 6.2.3.4 of the same document. The user can “release” the dataset and then it will be validated by the responsible of the platform. Once it is validated, the dataset will be published. Publishing a dataset only exposes the aggregated metadata and no individual data is released. | +| 8 | Upload Imaging Data | The instructions for using the tool are available in the User Guide for Data Holders ([https://eucaim.gitbook.io/enduserguide/6-userguide4members](https://eucaim.gitbook.io/enduserguide/6-userguide4members)). Proceed initially uploading medical imaging data. | +| 9 | Upload clinical Data | Once medical imaging data is uploaded, you can proceed with the clinical data. If you decide to convert the data through the [ETL](https://bio.tools/eetl_toolset) inside the node and this process is expected to be long, we encourage you to create an “image-only” dataset by skipping this step. | +> ⚠️ Attention! Please, do not proceed with the metadata release until you are declared legally compliant by your correspondent EUCAIM legal team member after providing all the requirements and the DTA/DSA has been signed by both the legal representative of your institution and EUCAIM’s Scientific Director (Dr. Luís Martí). -[Table 5](#table_6REFUPV): Uploading data in the UPV reference node. Steps 1 to 4 are described in section 5\. +| | | | +|---|---|---| +| 10 | Create and Publish the Dataset | The creation of a dataset is described in section 6.2.2.3 of the the User Guide for Data Holders ([https://eucaim.gitbook.io/enduserguide/6-userguide4members#id-6.2.-contribution-through-data-transfer](https://eucaim.gitbook.io/enduserguide/6-userguide4members#id-6.2.-contribution-through-data-transfer)). Once the dataset is created, additional metadata of the dataset can be added and the dataset published as described in 6.2.3.4 of the same document. The user can “release” the dataset and then it will be validated by the responsible of the platform. Once it is validated, the dataset will be published. Publishing a dataset only exposes the aggregated metadata and no individual data is released. | + +[Table 5](#tab_refupv): Uploading data in the UPV reference node. Steps 1 to 4 are described in section 5\. + +### | \# | Action | Documentation / Links | | :---- | :---- | :---- | @@ -34,8 +43,12 @@ Figure9: Steps in the process of transferring data to the nodes. Steps in purple | 5 | Request a EUCAIM User | [Registration of users in EUCAIM](https://drive.google.com/file/d/1EsFYxbzqpyYKggyeKrKKw3FkVecDby8P/view) | | 6 | Provide Data Ingester Account Details | Open a ticket in [https://help.cancerimage.eu](https://help.cancerimage.eu), select the “Reference nodes” group (or “Technical support team” if unavailable) and add a request with the title: “Create XNAT project” and providing the name of the project, the username in EUCAIM who will manage it (step \#4) and an URL if available. An answer will be given soon. | | 7 | Download and install CTP | The data ingestion tool for imaging is the Clinical Trial Processor (CTP). The standalone version can be downloaded here: [https://gitlab.com/radiology/infrastructure/data-curation-tools/ctp-standalone](https://gitlab.com/radiology/infrastructure/data-curation-tools/ctp-standalone). | -| 8 | Upload Imaging Data | Upload the imaging data using CTP, see manual here: [https://gitlab.com/radiology/infrastructure/data-curation-tools/ctp-standalone/-/blob/a195e33ff9711da8e5abefc8285c443b40b8502a/Manuals/EUCAIM%20XNAT%20Central%20Repository.pdf](https://gitlab.com/radiology/infrastructure/data-curation-tools/ctp-standalone/-/blob/a195e33ff9711da8e5abefc8285c443b40b8502a/Manuals/EUCAIM%20XNAT%20Central%20Repository.pdf) | +| 8 | Upload Imaging Data | Upload the imaging data using CTP, see manual here: [https://gitlab.com/radiology/infrastructure/data-curation-tools/ctp-standalone/...](https://gitlab.com/radiology/infrastructure/data-curation-tools/ctp-standalone/-/blob/a195e33ff9711da8e5abefc8285c443b40b8502a/Manuals/EUCAIM%20XNAT%20Central%20Repository.pdf) | | 9 | Upload clinical Data | Once medical imaging data is uploaded, you can proceed with the clinical data. [XNATpy](https://xnat.readthedocs.io/en/latest/) can be used to upload the CSV or JSON to XNAT. | +> ⚠️ Attention! Please, do not proceed with the metadata release until you are declared legally compliant by your correspondent EUCAIM legal team member after providing all the requirements and the DTA/DSA has been signed by both the legal representative of your institution and EUCAIM’s Scientific Director (Dr. Luís Martí). + +| | | | +|---|---|---| | 10 | Create and Publish the Dataset | The project in XNAT should be set to protected (or public) to make the metadata visible. | -[Table 6](#table_6REFHRI): Uploading data in the Health-RI reference node. Steps 1 to 3 are described in section 5\. +[Table 6](#tab_refhri): Uploading data in the Health-RI reference node. Steps 1 to 3 are described in section 5\. diff --git a/figures/image5.png b/figures/image5.png index 1d24247..c545ab6 100644 Binary files a/figures/image5.png and b/figures/image5.png differ