System Requirements and Internet Connectivity
Before beginning the installation process, ensure your system meets the following requirements:
-
Storage Space: A minimum of 70 GB of free disk space is required for installation and initial operations. It is recommended to have up to 150 GB of free space to accommodate future updates and data management needs.
-
Better have 500gb (if yo need to look example data from lfs server )
-
Memory: At least 16 GB of RAM is essential for smooth performance during installation and runtime.
-
Internet Connection: A stable internet connection is necessary throughout the installation process and for initial task executions. This ensures timely downloads and updates.
-
3000, 80 Ports: If, for some reason, ports are not open in Docker, you may need to open ports 80 and 3000 for Docker, or you might have to disable the firewall.
-
In the Docker settings under Resources, set a minimum of 8 GB Memory limit.
Please note that the demo server includes large datasets, which may result in lengthy download times. The time required will depend on your network speed and stability, so if any step appears to stall, you may pause with Control+C
and restart as needed to continue the download process.
Datasets with which you can work are embedded in the repository at the links below
IMC Tonsil
IMC PDAC
MIBI TNBC
-
Full Application Package (use submodules for easy cloning of all components):
spex_bundle -
Backend: spex_backend
-
Frontend: spex_frontend
-
Common Modules: spex_common
- Image Downloader from Omero for Processing: spex_ms_omero_image_downloader
- Omero Session Management (providing image information and API access): spex_ms_omero_sessions
- Task Queue Manager: spex_ms_pipeline_manager
- Script Execution and Environment Management: spex_ms_job_manager
- Data Clustering: spex_clustering
- Image Segmentation: spex_segmentation
- Spatial Transcriptomics: spex_spatial_transcriptomics
These algorithms enable customization of data processing parameters and are integrated with the spex_ms_job_manager
microservice for executing analytical tasks.
Ubuntu
- Open Terminal and run:
sudo apt update
sudo apt install git-lfs
Windows
- Download and install Git for Windows.
- if you have installed Git for Windows, you can check if running installs Git LFS:
- Open Powershell as administrator and run:
git lfs install
if you have this output:
Git LFS initialized.
go to the next step Bundle installation , if not
- Download and install Git LFS follow the instructions for Windows installation.
- Go to the folder where you will deploy the project. To navigate to a project folder in the terminal, you can use the cd command, which stands for "change directory."
cd my_project
- To set up Git LFS, open the terminal and run the following command:
git lfs install
- For the production bundle of the application, clone the repository:
git clone https://github.com/Genentech/spex_demo.git .
git lfs pull
- Wait for the process to complete. The total size of all downloaded project files should be around 10 gigabytes.
Set executable permissions (Ubuntu):
chmod -R +x .
- Download and install Docker Desktop
Ubuntu
- Execute the application demo script:
./app_demo.sh up
Windows
Wait for the download to complete. If the download does not complete or hangs due to unstable connection, stop the process control+C and start the process again.
After the download is complete and the necessary images and containers are created, you should see 11 containers in the Docker application.
As a result, a browser window should open asking you to log in. If the page is not displayed? Try waiting 5-15 minutes and reload the page. Perhaps the containers have not all had time to collect yet.
for open application you can start host "http://127.0.0.1:3000" in your browser, at the first start, I would wait 5 minutes for the services to be initialized, such as the Omero server and frontend.
for more information about SPEX can use
Working workflow
- login in application use username root and password omero
-
To initiate a test process, first select Project 1 and click the Analyze button.
Next, click the "Add Process" button, and enter the name of the process, such as "test".
Then, access the process by clicking on it in the process list, and proceed to create the first task.
-
Blocks can be connected to each other; the entry point is the choice of what we work with,
an image or an anndata file. Subsequently, we select the following related blocks,
which perform data transformation to achieve the desired result.
-
All tasks are executed sequentially. You can start all tasks using the "Start ▶" button or the "Play ▶"
button in each block. Also, you can delete a block if it is not needed.
Description of the syntax for custom algorithms, how we can integrate our own mechanisms into the system, as well as their parameters.:
All scripts are located inside the project at the path /demo_data/scripts/, where there are three folders containing various scripts that are already used in the project. Any of them can be used as a reference example. Below, you will also find descriptions of possible modifications that can be applied.
-
Project root
manifest.json
— The main project manifest (if used). It may contain global settings or link individual pipeline stages.- Other files unrelated to a specific stage.
-
Stage folders (e.g.,
load_anndata
,clustering
,dimensionality_reduction
)-
manifest.json
(inside the stage folder)- Describes the key parameters required by this stage.
- Contains the stage name, description, execution order (
stage
), input parameters (params
), and expected output (return
). - Defines dependencies (
depends_and_script
,depends_or_script
) and environment settings (conda
,libs
,conda_pip
). - Conda Environments: If
conda
is specified, the system creates or uses a Conda environment with the requested Python version and installs the necessary libraries (libs
via Conda,conda_pip
via pip). - Parameter Definitions: The
params
section inmanifest.json
must include precise details for all possible parameter types to ensure smooth communication between the client and the script. - Result Transfer: The result of each script is passed to the next stage through the structured output format defined in
return
.
-
app.py
(the executable script for this stage)- This file must always be named
app.py
to maintain consistency. - It contains the core logic: reading data, processing it, and returning results.
- Typically includes a
run(**kwargs)
function that:- Imports necessary dependencies (e.g.,
scanpy
,numpy
). - Reads parameters from
kwargs
(e.g., file paths, method choices, metrics). - Executes core functions (e.g., data loading, clustering, dimensionality reduction).
- Returns results in the format defined in
manifest.json
.
- Imports necessary dependencies (e.g.,
- This file must always be named
-
Each stage's manifest.json
should specify parameters under params
using the following structure:
"params": {
"parameter_name": {
"name": "Parameter Name",
"label": "User-friendly label",
"description": "Detailed description of the parameter",
"type": "TYPE",
"required": true,
"default": "default_value",
"enum": ["option1", "option2"],
"min": 0,
"max": 100
}
}
Type | Description & Manifest Example |
---|---|
string |
A text field. { "type": "string" } |
int |
Integer input. { "type": "int", "min": 0, "max": 100 } |
float |
Floating-point number. { "type": "float", "min": 0.0, "max": 1.0 } |
enum |
A dropdown selection. { "type": "enum", "enum": ["option1", "option2"] } |
Type | Description & Manifest Example |
---|---|
file |
A file selector. { "type": "file" } |
dataGrid |
A structured table/grid input. { "type": "dataGrid" } |
Type | Description & Manifest Example |
---|---|
omero |
Image selection from OMERO. { "type": "omero" } |
channel |
A single channel selector. { "type": "channel" } |
channels |
Multi-channel selector. { "type": "channels" } |
Type | Description & Manifest Example |
---|---|
job_id |
A job selector. { "type": "job_id" } |
process_job_id |
A process job selector. { "type": "process_job_id" } |
These types are mapped to their respective React components in the UI, ensuring proper handling on the client-side.
OMERO supports various image formats, excluding those with a time dimension (e.g., time-lapse TIFFs).
Format | Description |
---|---|
TIFF (.tif, .tiff) | Multi-channel, multi-dimensional image storage widely used in microscopy. |
OME-TIFF (.ome.tif, .ome.tiff) | A standardized format supporting structured metadata and multiple channels (CXY or CYXZ). |
Unsupported Formats:
- TIFF stacks with time dimension (TXYC or TXYZC) → Not supported for direct OMERO ingestion in this workflow.
H5AD is a format used for storing annotated multi-dimensional data, particularly in single-cell transcriptomics and spatial biology.
-
Observations (Cells or Regions) (
adata.obs
)fov
,volume
,min_x
,max_x
,min_y
,max_y
— Metadata defining spatial boundaries and properties.
-
Variables (Genes or Features) (
adata.var
)- Contains
n_vars
variables (e.g., genes), with no additional annotations.
- Contains
-
Spatial Data (
adata.obsm
,adata.uns
)- Stores spatial coordinates and additional metadata.
{
"params": {
"adata": {
"name": "AnnData File",
"label": "Spatial transcriptomics dataset",
"description": "H5AD file containing spatial gene expression data",
"type": "file",
"required": true
}
}
}