SOMA Curation

Overview

soma-curation is a light-weight Python package used at Phenomic to streamline the curation and management of single-cell RNA sequencing (scRNA-seq) atlases using TileDB-SOMA. It's still in its early stages, but the hope is to allow bioinformaticians and ML practitioners to organize their SOMA atlases and access their raw data a bit better. There are assumptions of raw storage organization baked into the package that mimic practices at Phenomic.

Installation

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install soma-curation

Quick Start

Define your schema:

from soma_curation.schema import load_schema

# loads the default schema
schema = load_schema()

Organize your raw data:

# You can simulate this structure with the following commands
from soma_curation.utils import test_dummy_structure
test_dummy_structure()

It should give you a structure like this:

raw_data/
├── study_1/
│   ├── mtx/
│   │   ├── sample_1/
│   │   └── sample_2/
│   ├── cell_metadata/
│   │   ├── study_1.tsv.gz
│   └── sample_metadata/
│   │   ├── study_1.tsv.gz
└── study_2/
    └── ...

Create and use your collection:

from soma_curation.collection import MtxCollection

# For MTX files
collection = MtxCollection(
    storage_directory="path/to/raw_data",
    db_schema=schema
)

# Access AnnDatas from MTX files
adata = collection.get_anndata(study_name="study_1", sample_name="sample_1")

# For H5AD files
h5ad_collection = H5adCollection(
    storage_directory="path/to/h5ad_files"
)

# List all H5AD files
h5ad_files = h5ad_collection.list_h5ad_files()

# Access AnnData directly from an H5AD file
adata = h5ad_collection.get_anndata(filename="file1.h5ad")

Create a TileDB-SOMA Experiment

from soma_curation.atlas.crud import AtlasManager

# Create an atlas
am = AtlasManager(atlas_name="...", db_schema=db_schema, storage_directory="...")
am.create()

# Delete an atlas
# am.delete()

Create a Dataset according to your schema and standardize it

from soma_curation.dataset.anndataset import AnnDataset

# Create a Phenomic Dataset
# Original anndata is stored under the `.artifact` attribute
dataset = AnnDataset(
    atlas_manager=am,
    collection=collection
)
dataset.standardize()

Ingest your data into the TileDB-SOMA Experiment using traditional TileDB-SOMA syntax documented here

Documentation

For detailed documentation, including API reference and usage examples, visit our documentation site.

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This work was inspired by the TileDB-SOMA and CellxGene Census teams. We extend our gratitude to the TileDB team for their valuable feedback and support.

Setup

Below are setup instructions. If you're working in VSCode, we highly recommend installing the Python extension.

Cloning the Repository

Clone this repository to your local machine:

git clone https://github.com/PhenomicAI/soma-curation.git
cd soma-curation/

Developer Setup

You only need to create a virtual environment once.

Create and activate a virtual environment:

virtualenv venv
source venv/bin/activate
pip install ".[dev]"

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github		.github
docs		docs
src/soma_curation		src/soma_curation
tests		tests
.gitignore		.gitignore
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SOMA Curation

Overview

Installation

Quick Start

Documentation

Contributing

License

Acknowledgments

Setup

Cloning the Repository

Developer Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

PhenomicAI/soma-curation

Folders and files

Latest commit

History

Repository files navigation

SOMA Curation

Overview

Installation

Quick Start

Documentation

Contributing

License

Acknowledgments

Setup

Cloning the Repository

Developer Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages