diff --git a/DataPreparation.md b/DataPreparation.md
index 3978d1b..df4e19d 100644
--- a/DataPreparation.md
+++ b/DataPreparation.md
@@ -192,8 +192,7 @@ For negative screening/control groups, **region and laterality are not mandatory
## **Data preparation and related tools from the EUCAIM catalogue**
For the purpose of data preparation, several tools have been selected
-and developed in EUCAIM. [Figure
-7](https://eucaim.gitbook.io/handbook/datapreparation#fig_datatools)
+and developed in EUCAIM. [Figure 7](#fig_datatools)
shows the main tools selected for this phase.
***Use of EUCAIM-provided tools***
@@ -207,27 +206,25 @@ Please read the sections below carefully. EUCAIM
technical support team can assist you
throughout this process via the Helpdesk.
-| | |
+
+| | |
|---|---|
-|  |  |
-|  |  |
-|  |  |
-|  |  |
-|  | |
-|  | |
-
-[Figure
-7](https://eucaim.gitbook.io/handbook/datapreparation#figur_datatools):
-EUCAIM data preparation tools for data holders. Click on the thumbnail
-for more information about the tool.
+|
|
|
+|
|
|
+|
|
|
+|
|
|
+|
|
|
+|
|
|
+
+**Figure 7:** EUCAIM data preparation tools for data holders. Click on the thumbnail for more information about the tool.
Instructions on the downloading and usage of each tool are given in the
links provided in the description of the tools in the bio.tools
catalogue.
Data holders can get information about the data preparation tools
-(listed in the following subsections) in the bio tools catalogue
-([https://bio.tools/t?domain=eucaim](https://bio.tools/t?domain=eucaim)).
+(listed in the following subsections) in the
+bio.tools catalogue.
The binaries of the tools can be downloaded from:
- the EUCAIM Software artifacts registry, the EUCAIM harbor
@@ -236,11 +233,10 @@ The binaries of the tools can be downloaded from:
#### Access to the EUCAIM Software artifacts registry (Harbor)
-([https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories))
+(https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories)
The access to the registry requires a valid account and additional
-permissions that can be requested on the first access to the registry. Instructions on how to request access and download tools are available [here
-](https://drive.eucaim.cancerimage.eu/s/pxpTJWSTFsLbqPQ?dir=/&editing=false&openfile=true)\.
+permissions that can be requested on the first access to the registry. Instructions on how to request access and download tools are available here.
It is advisable that once data holders request access to the registry, they open a ticket in the EUCAIM
helpdesk - in the enrollment group - to speed up the process of approval
@@ -253,7 +249,7 @@ the Harbor repository and download the required tools.
#### Access to the EUCAIM drive repository
-([https://drive.eucaim.cancerimage.eu/apps/files/files/1520?dir=/Applications](https://drive.eucaim.cancerimage.eu/apps/files/files/1520?dir=/Applications))
+(https://drive.eucaim.cancerimage.eu/apps/files/files/1520?dir=/Applications)
## **Tier 1 datasets**
@@ -273,9 +269,7 @@ to be transferred to a reference node.
You may want to annotate your imaging data to enrich the quality of your
dataset.
-Tools: We recommend using the [**MITK
-(Medical Imaging Interaction Toolkit)
-Workbench**](https://bio.tools/mitk), which ensures the output
+Tools: We recommend using the MITK (Medical Imaging Interaction Toolkit) Workbench, which ensures the output
format will be in the required format to be compliant with EUCAIM. Using
it would avoid the burden (and the risk) of additional conversion
procedures. Data can be also annotated using the DICOM Viewers from
@@ -285,62 +279,60 @@ reference node environments after transferring the data.
imaging raw data are in DICOM format, and that your annotations are in
DICOM-SEG.\
Tools: If you have existing annotation files
-that are not in DICOM-SEG, you may use the EUCAIM [**Annotation Seg
-converter**](https://hub.docker.com/r/mariov687/dicomseg) tool to
-convert them.
+that are not in DICOM-SEG, you may use the EUCAIM Annotation Seg
+converter tool to convert them.
#### **Step 2: De-identification**
You must ensure that no identifiable information (direct or indirect) is
-present in the dataset you will share (Figure 9).
+present in the dataset you will share.
-***Important points to consider before
-de-identification***
+***Important points to consider before de-identification***
If your Tier 1 dataset is not originally anonymized we recommend
preparing a tabular file associating StudyUIDs from DICOM images with
corresponding clinical “episode” and “timepoint events”, in case the
dataset contains multiple episode/timepoints.
-Tools: This can be done using the [**DICOM
-tags extractor**](https://bio.tools/dicom_tags_extractor) tool
-(Figure 7). For more information, see further below section
+Tools: This can be done using the DICOM
+tags extractor tool ([Figure 7](#fig_datatools)). For more information, see further below section
[5.3.3.2](#bookmark=id.e3irrt7bxs08) Step 2 on imaging data
preparation.
-If your imaging data are not already de-identified, you may use the
-[**Lethe EUCAIM
-Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/)
-(Figure 7). In this case, you must ensure the following:
+If your imaging data are not already de-identified, you may use the Lethe DICOM Anonymizer ([Figure 7](#fig_datatools)).
+However, even if your dataset has already been anonymized using your own methods, we strongly recommend using the Lethe DICOM Anonymizer, which is the official de-identification tool in EUCAIM. The main reasons are the following:
+- **Unique Patient ID Generation**: Lethe DICOM Anonymizer automatically assigns a hashed PatientID to each patient. This mechanism ensures that the PatientID remains unique across the entire EUCAIM ecosystem, preventing any ID collisions between different DHs. This hash is generated using two components:
+ - The original Patient ID.
+ - The specific SiteID of the Data Holder.
+- **How to obtain your SiteID**: The SiteID is a required input for Lethe and can be retrieved from your User Profile in the EUCAIM Dashboard (UUID). To access this, you must log in with your institutional account, which must be properly registered in LS-AAI. You have to coordinate with your local IT department to ensure your institution is correctly integrated into the LS-AAI system. Google accounts or similar can’t be used to retrieve this SiteID.
+- **Synchronizing Clinical Data**. To ensure your clinical data matches the hashed PatientIDs generated for the DICOM images, you can provide a CSV file during the anonymization process. The only requirement is that the first column must be the original PatientID. Lethe will then output:
+ - The anonymized DICOM images.
+ - A modified CSV file where the original IDs are replaced by the new hashed IDs.
+
+The use of the Lethe DICOM Anonymizer requires:
-- the patient ID linking clinical and imaging data must be identical and
+- The patient ID linking clinical and imaging data must be identical and
listed as the first variable in the clinical dataset for tabular data;
-- your raw imaging data are in DICOM format;
+- Your raw imaging data are in DICOM format;
-- the tool requires as input the SITE_ID, the unique identifier of the
- data provider, which you can see in your user profile from the
- [EUCAIM Dashboard](https://dashboard.eucaim.cancerimage.eu/)
- ([Figure](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon)
- 9). In case your Life Science account is not
- assigned to a known organization, then this will be empty and so you
- can create a ticket in the Helpdesk to request one;
+- The tool requires as input the SITE_ID, the unique identifier of the data holder, which is you can see
+ in your user profile from the EUCAIM
+ Dashboard. In case your Life Science account is not assigned to a known organization, then
+ this will be empty and so you can create a ticket in the Helpdesk to
+ request one;
+
+While using the Lethe DICOM Anonymizer tool is not mandatory, we strongly recommend its use to ensure secure and unique hashed PatientIDs within the EUCAIM infrastructure.
Special attention must be given to **embedded text** in images, which
may contain patient-identifiable information, as well as **craniofacial
images** that pose a risk of patient re-identification. You may need to
apply additional de-identification techniques to mitigate this risk.\
-Tools: Tools such as the [**DICOM defacing
-anonymisation**](https://bio.tools/dicom_defacing_anonymation) tool
-from the EUCAIM catalogue (Figure 7) may be used to remove facial
+Tools: Tools such as the DICOM defacing
+anonymisation tool from the EUCAIM catalogue ([Figure 7](#fig_datatools)) may be used to remove facial
features from your DICOM images. For 2D ultrasounds and mammography
-**dataset**, you may use the [**Trace4MedicalImage
-cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, that
-detects and removes encapsulated text in DICOM files. [The Lethe
-EUCAIM
-Anonymizer](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer)
-tool also provides options to remove burned-in PHI pixel data from the
-images.
+**dataset**, you may use the Trace4MedicalImage
+cleaning tool, that detects and removes encapsulated text in DICOM files. The Lethe DICOM Anonymizer tool also provides options to remove burned-in PHI pixel data from the images.
**Re-identification risk assessment (optional)**: Even if no automatic
re-identification risk analysis on a combination of clinical and imaging
@@ -348,14 +340,8 @@ metadata is possible at this Tier, you should carefully assess that no
direct or indirect identifiers are present in your data.\
Tools: For assessing the risk of
re-identification of patients based on your **imaging metadata** before
-sharing your dataset, you may use the [EUCAIM **Wizard
-tool**](https://bio.tools/eucaim_wizard_tool). Extraction of imaging
-metadata to feed the wizard tool is possible by using the [**DICOM
-tags extractor**](https://bio.tools/dicom_tags_extractor) tool
-(Figure
-[7](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon)).
-You may also use the [ARX Anonymization
-Tool](https://bio.tools/arx) to assess the re-identification risk of
+sharing your dataset, you may use the EUCAIM Wizard tool. Extraction of imaging metadata to feed the wizard tool is possible by using the DICOM tags extractor tool ([Figure 7](#fig_datatools)).
+You may also use the ARX Anonymization Tool to assess the re-identification risk of
your clinical metadata, but it requires the specification of the
quasi-identifier attributes by the DH. In addition, the creation of
generalization hierarchies is necessary if you want to perform a
@@ -391,16 +377,12 @@ can help you to assess the degree of compliance of your dataset to each
EUCAIM DQ dimension:
- the **accuracy** and **integrity** of your imaging dataset may be
- assessed using the [**DICOM File integrity
- checker**](https://bio.tools/dicom_file_integrity_checker_by_gibi230).
+ assessed using the DICOM File integrity checker.
- **Uniqueness** can be addressed with two EUCAIM tools that search for
- image duplicates: the [**Image duplicates
- checker**](https://bio.tools/dicom_image_similarity-duplicate_checker),
- capable of detecting duplicate or visually similar DICOM series by
- combining metadata analysis, hash-based comparison, and pixel-level
- similarity metrics; the [**Image duplicate check
- tool**](https://bio.tools/image_duplicate_check_tool), that
+ image duplicates: the Image duplicates
+ checker, capable of detecting duplicate or visually similar DICOM series by combining metadata analysis, hash-based comparison, and pixel-level
+ similarity metrics; the Image duplicate check tool, that
detects duplicate DICOM images by analyzing pixel data.
#### **Step 4: Data transfer**
@@ -416,27 +398,24 @@ to Section 6 of the Handbook for further information.
### **EUCAIM Common Data Model and Hyperontology**
-The [**EUCAIM Common Data
-Model**](https://eucaim.gitbook.io/eucaim-common-data-model/1.-introduction)
-defines a standardized structure for representing clinical and imaging
+The **EUCAIM Common Data
+Model** defines a standardized structure for representing clinical and imaging
metadata across the EUCAIM platform. It ensures that data contributed by
different partners can be understood and used in a consistent way.
**Key features:**
-- It is based on the conceptual model of [mCode
- specification](https://ascopubs.org/doi/10.1200/CCI.20.00059)
+- It is based on the conceptual model of mCode
+ specification.
-- The current version of the EUCAIM CDM Data Dictionary is available
- [here](https://docs.google.com/spreadsheets/d/1ox9PdvfCDxpDmEnFzC1M6OFhUhXpjQzg/edit?usp=sharing&ouid=115998150174651530097&rtpof=true&sd=true).
+- The current version of the EUCAIM Common Data Model - Data Dictionary is available here.
- Supports multimodal data (i.e. imaging and clinical).
- Facilitates efficient querying, tool compatibility, and federated
analysis and learning.
-The [**EUCAIM**
-**hyperontology**](https://hyperontology.eucaim.cancerimage.eu/)
+The EUCAIM hyperontology
is a common semantic meta-model that supports and maintains semantic
interoperability and ensures consistent mapping and harmonization with
the EUCAIM CDM entities (tables and attributes). It provides rich
@@ -474,8 +453,7 @@ above:
In order to have interoperable data that can be queried and processed,
we need you to provide us with information on your dataset structure
-using another tabular template file
-([EUCAIM_example_file_patients_datasets_CDM_v6](https://docs.google.com/spreadsheets/d/1zAReu8-40cAdH8Z7jH3kaHyYkrCILd2X/edit?usp=drive_link&ouid=105979482259582415027&rtpof=true&sd=true))
+using another tabular template file EUCAIM_example_file_patients_datasets_CDM_v6
*in addition to* your source dataset.
- **How the tabular template file is organized:**
@@ -638,8 +616,7 @@ and skip step 3. It is important that you can still link the
(anonymized) PatientID with the episodes and timepoints.
Tools: To assist you retrieving all PatientID
-and StudyUID from your imaging dataset, you may use the [**DICOM tags
-extractor tool**](https://bio.tools/dicom_tags_extractor) and its
+and StudyUID from your imaging dataset, you may use the https://bio.tools/dicom_tags_extractor and its
“dicom_tags_selection” script. A template csv input file called
“imaging_studies_episodes.csv”, provided with the tool, allows to
retrieve the following attributes from your imaging dataset (cf tool
@@ -777,80 +754,67 @@ part edited manually by the data holder.
#### **Step 3: image annotation (optional)**
You may want to annotate your imaging data to enrich your dataset. We
-recommend using the [**MITK (Medical Imaging Interaction Toolkit)
-Workbench**](https://bio.tools/mitk) that ensures the output format
-will be in the required format to be compliant with EUCAIM. Using it
+recommend using the MITK (Medical Imaging Interaction Toolkit) Workbench that ensures the output format will be in the required format to be compliant with EUCAIM. Using it
would avoid the burden (and the risk) of additional conversion
procedures. Data can be also annotated using the DICOM Viewers from
reference nodes environments after transferring the data (Step 7).
Your imaging raw data must be in DICOM and your annotations in DICOM-SEG
format. If you have existing annotation files that are not in DICOM-SEG,
-you may use the EUCAIM [**Annotation Seg
-converter**](https://hub.docker.com/r/mariov687/dicomseg) tool to
-convert them.
+you may use the EUCAIM Annotation Seg
+converter tool to convert them.
#### **Step 4: De-identification**
You must ensure that no identifiable information (direct or indirect) is
-present in the dataset you will share (**Figure 9**).
+present in the dataset you will share.
-The official tool for de-identification in EUCAIM is [**Lethe EUCAIM
-Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/). This tool ensures the specific PatientID code system.
-Even if you are already anonymizing data using your own methods, we strongly recommend using the EUCAIM tool. The main reasons are:
-- **Unique Patient ID Generation**: Lethe Anonymizer automatically assigns a hashed PatientID to each patient. This 32mechanism ensures that the PatientID remains unique across the entire EUCAIM ecosystem, preventing any ID collisions between different DHs. This hash is generated using two components:
+If your imaging data are not already de-identified, you may use the Lethe DICOM Anonymizer ([Figure 7](#fig_datatools)).
+However, even if your dataset has already been anonymized using your own methods, we strongly recommend using the Lethe DICOM Anonymizer, which is the official de-identification tool in EUCAIM. The main reasons are the following:
+- **Unique Patient ID Generation**: Lethe DICOM Anonymizer automatically assigns a hashed PatientID to each patient. This mechanism ensures that the PatientID remains unique across the entire EUCAIM ecosystem, preventing any ID collisions between different DHs. This hash is generated using two components:
- The original Patient ID.
- The specific SiteID of the Data Holder.
- **How to obtain your SiteID**: The SiteID is a required input for Lethe and can be retrieved from your User Profile in the EUCAIM Dashboard (UUID). To access this, you must log in with your institutional account, which must be properly registered in LS-AAI. You have to coordinate with your local IT department to ensure your institution is correctly integrated into the LS-AAI system. Google accounts or similar can’t be used to retrieve this SiteID.
- **Synchronizing Clinical Data**. To ensure your clinical data matches the hashed PatientIDs generated for the DICOM images, you can provide a CSV file during the anonymization process. The only requirement is that the first column must be the original PatientID. Lethe will then output:
- The anonymized DICOM images.
- - A modified CSV file where the original IDs are replaced by the new hashed IDs.”
+ - A modified CSV file where the original IDs are replaced by the new hashed IDs.
-([Figure
-7](https://eucaim.gitbook.io/handbook/datapreparation#bookmark=kix.br72yai62sd4)). The use of [**Lethe EUCAIM
-Anonymizer**](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer/) requires:
+The use of the Lethe DICOM Anonymizer requires:
- The patient ID linking clinical and imaging data must be identical and
listed as the first variable in the clinical dataset for tabular data;
- Your raw imaging data are in DICOM format;
-- The tool requires as input the SITE_ID
- (**[Figure](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon)
- 9**), the unique identifier of the data provider, which is you can see
- in your user profile from the [EUCAIM
- Dashboard](https://dashboard.eucaim.cancerimage.eu/). In case your
- Life Science account is not assigned to a known organization, then
+- The tool requires as input the SITE_ID, the unique identifier of the data holder, which is you can see
+ in your user profile from the EUCAIM
+ Dashboard. In case your Life Science account is not assigned to a known organization, then
this will be empty and so you can create a ticket in the Helpdesk to
request one;
+While using the Lethe DICOM Anonymizer tool is not mandatory, we strongly recommend its use to ensure secure and unique hashed PatientIDs within the EUCAIM infrastructure.
+
Special attention should be given to **embedded text** in images, that
may contain patient-identifiable information, as well as **skull and
head images** that pose a risk of patient re-identification. You may
need to apply additional de-identification techniques to mitigate this
risk.\
-Tools: Tools such as the [**DICOM defacing
-anonymisation**](https://bio.tools/dicom_defacing_anonymation) tool
-from the EUCAIM catalogue (Figure 7) may be used to remove facial
+Tools: Tools such as the DICOM defacing
+anonymisation tool
+from the EUCAIM catalogue ([Figure 7](#fig_datatools)) may be used to remove facial
features from your DICOM images. For 2D ultrasounds and mammography
-**dataset**, you may use the [**Trace4MedicalImage
-cleaning**](https://bio.tools/trace4medicalimagecleaning) tool, that
-detects and removes encapsulated text in DICOM files. [The Lethe
-EUCAIM
-Anonymizer](https://harbor.eucaim.cancerimage.eu/harbor/projects/3/repositories/lethe-dicom-anonymizer)
-tool also provides options to remove burned-in PHI pixel data from the
-images.
+**dataset**, you may use the Trace4MedicalImage
+cleaning tool, that detects and removes encapsulated text in DICOM files. The Lethe DICOM Anonymizer
+tool also provides options to remove burned-in PHI pixel data from the images.
**Re-identification risk assessment for imaging and clinical data
(optional)**: Before sharing your dataset, you should carefully assess
that no direct or indirect identifiers are present in your data.\
Tools: Extraction of imaging metadata to feed
-the wizard tool is possible by using the [**DICOM tags
-extractor**](https://bio.tools/dicom_tags_extractor) tool (Figure
-[7](https://eucaim.gitbook.io/handbook/datapreparation#fig_dataanon)).
+the wizard tool is possible by using the DICOM
+tags extractor tool ([Figure 7](#fig_datatools)).
Based on the EUCAIM CDM structure, ready-to-use hierarchies can be
-imported in the [EUCAIM **Wizard
-tool**](https://bio.tools/eucaim_wizard_tool) to initiate an
+imported in the EUCAIM Wizard tool to initiate an
analysis that is specifically tailored to the vocabulary and
classification used in EUCAIM clinical metadata as well. The process and
rationale is identical to the imaging metadata risk analysis, but the
@@ -862,7 +826,7 @@ clinical and imaging information independently will work cumulatively
for the overall data value.
You must ensure that no identifiable information (direct or indirect) is
-present in the dataset you will share (Figure 9).
+present in the dataset you will share.
#### **Step 5: Data quality assessment**
@@ -885,31 +849,23 @@ dataset is**:
the degree of compliance of your dataset to these principles. Some tools
from the EUCAIM catalogue can help you to do so:
-- The [**DICOM File integrity
- checker**](https://bio.tools/dicom_file_integrity_checker_by_gibi230)
+- The DICOM File integrity checker
can check the **accuracy** and **integrity** of your imaging dataset.
- For 2D ultrasounds and/or mammography **datasets,** **validity**
- assessment is possible using the [**Trace4MedicalImage
- cleaning**](https://bio.tools/trace4medicalimagecleaning) tool,
+ assessment is possible using the Trace4MedicalImage cleaning tool,
that detects and removes encapsulated text in DICOM files.
- **Uniqueness** can be addressed with two EUCAIM tools that search for
- image duplicates: the [**Image duplicates
- checker**](https://bio.tools/dicom_image_similarity-duplicate_checker),
- capable of detecting duplicate or visually similar DICOM series by
- that combining metadata analysis, hash-based comparison, and
- pixel-level similarity metrics; the [**Image duplicate check
- tool**](https://bio.tools/image_duplicate_check_tool), that
+ image duplicates: the Image duplicates
+ checker, capable of detecting duplicate or visually similar DICOM series by combining metadata analysis, hash-based comparison, and pixel-level
+ similarity metrics; the Image duplicate check tool, that
detects duplicate DICOM images by analyzing pixel data.
-- The
- [**DIQCT**](https://bio.tools/data_integration_quality_check_tool_diqct)
+- The DIQCT
may help you assess various aspects of your dataset’s quality, both
for imaging and clinical data, such as its **completeness, uniqueness,
- validity, consistency, integrity.**
-
-> ·
+ validity, consistency, integrity.**> ·
#### **Step 6: Data conversion to EUCAIM Common Data Model**
@@ -923,8 +879,8 @@ a\) the mapping between the source metadata (clinical and imaging) and
the EUCAIM CDM.
b\) the actual transformation of all the clinical and imaging data to a
-format compliant with the EUCAIM CDM through the use of the [**EUCAIM
-ETL**](https://bio.tools/eetl_toolset).
+format compliant with the EUCAIM CDM through the use of the EUCAIM
+ETL.
For your imaging dataset:
@@ -935,8 +891,7 @@ For your imaging dataset:
> EUCAIM CDM.
>
> \- Extract in a tabular csv file all the 75 mandatory attributes (list
-> available here:
-> )
+> available here)
> present in your dataset. You may already have such file, especially if
> you used the Wizard tool on step 3 “de-identification” for
> re-identification risk assessment of imaging data. If not, you may use
@@ -944,8 +899,7 @@ For your imaging dataset:
>
> Finally, share the **two above-mentioned csv files** as well as the
> **file from step 2 on PatientID/StudyUID correspondence** with the ETL
-> ingestion support team through the [EUCAIM
-> helpdesk](https://help.cancerimage.eu/).
+> ingestion support team through the EUCAIM helpdesk.
| **Source series Description** | **EUCAIM series description** |
|---------------------------------------|-------------------------------|
@@ -958,8 +912,7 @@ For your imaging dataset:
**Table 6: Example of correspondence between the Series Description from
the source images and the Series Description from the EUCAIM standard.**
The part in blue corresponds to the part edited manually by the data
-holder. See
-[**here**](https://docs.google.com/document/d/1mnTkf2fvERgaRyQPDFebZHLwB8aBRaIZRkwlMBr3ZXQ/edit?tab=t.0)
+holder. See here
for the list of all possible SeriesDescription currently known in the
EUCAIM vocabulary.