Principles of SCimilarity's integration function (or say batch effect removal)

Thanks for developing this nice tool!

I have tried SCimilarity on my own dataset, which consists of 6 time series samples from a cell line cultured under nutrient deprivation. And SCimilarity's result seems confused, and when I checked the query coherence and the embedding computed by SCimilarity, I found it strange. (figure below). 

![Image](https://github.com/user-attachments/assets/d7f2c177-2084-418b-a804-ba8b4cd63bc0)

Embedding was blurring. And my most similar cells was come from xenograft studies, and most importantly, many cells were overlapped, as shown  by Jaccard Index between different cluster groups, calculated using cell index provided in result metadata.

LUAD xenograft studies, Jaccard Index of similar cells' index:

![Image](https://github.com/user-attachments/assets/03722734-8e6d-4de6-a3f4-42524e2d2901)

And when I score the cell with signature derived from previous seurat's clustering marker, I found the SCimilarity embedding seperated the Cluster3 into two subclusters, which made me even more confused because my original UMAP using seurat did not show any clues about Cluster3's subclusters.

![Image](https://github.com/user-attachments/assets/96b7495d-aed0-41cf-a043-5f9a170540a8)

my original UMAP plot computed using Harmony-corrected embeddings:

![Image](https://github.com/user-attachments/assets/ba3ac4c4-9ca3-4b16-9703-b7a714ceffb0)

So I started to think about the integration principle, or batch effect removal within SCimilarity. In the tutorial, it seems that we only need input the expression matrix into the SCimilarity's function. However, other batch effect removal methods, such as Harmony, which I used with my seurat object, need the sample metadata or group.by metadata, i.e., different batch. So I am curious about how SCimilarity can accomplish the batch effect removal without such information provided. In addition, I notice that in the original paper Fig.2b, the batch effect removal performance of SCimilarity only win in the cell type cluster coherence but fail in removing study batchiness when compared to other methods:

![Image](https://github.com/user-attachments/assets/55b9cc10-bd44-4e08-8e0e-c33be1e1f84a)

Therefore, I wonder how SCimilarity perform integration without group or sample information provided. Hopefully, any suggestion of interpreting my result from SCimilarity would be greatly appreciated!

Thanks in advance.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Principles of SCimilarity's integration function (or say batch effect removal) #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Principles of SCimilarity's integration function (or say batch effect removal) #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions