-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype for storing single-cell data #1020
base: development
Are you sure you want to change the base?
Conversation
9222a95
to
788a61b
Compare
gemma-core/src/test/java/ubic/gemma/persistence/util/ListUtilsTest.java
Outdated
Show resolved
Hide resolved
f04ae2f
to
d791771
Compare
...va/ubic/gemma/persistence/service/expression/experiment/ExpressionExperimentServiceImpl.java
Outdated
Show resolved
Hide resolved
.../main/java/ubic/gemma/persistence/service/expression/experiment/ExpressionExperimentDao.java
Show resolved
Hide resolved
gemma-core/src/main/java/ubic/gemma/model/expression/bioAssayData/DataVector.java
Outdated
Show resolved
Hide resolved
...-core/src/main/resources/ubic/gemma/model/expression/designElement/CompositeSequence.hbm.xml
Outdated
Show resolved
Hide resolved
b60a8de
to
0ce142a
Compare
80c6409
to
e804d92
Compare
b2c8a8b
to
6a993b7
Compare
b7d4810
to
7c29995
Compare
I'm in the process of merging the dev branch to get this work up-to-date. |
@@ -130,6 +130,13 @@ | |||
<!-- cannot be non-null because subsets and generic experiments don't have curation details --> | |||
<column name="CURATION_DETAILS_FK" not-null="false" sql-type="BIGINT" unique="true"/> | |||
</many-to-one> | |||
<set name="singleCellExpressionDataVectors" lazy="true" fetch="select" inverse="true" | |||
cascade="all-delete-orphan"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the -delete-orphan
and manage vectors the same way we do for raw and processed ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That include bulk insertion, removal, etc.
Add basic support in SingleCellDescriptive and DataVectorDescriptive for floats, ints and longs. Add missing conversion logic for scale types. Reuse those to implement aggregation of floats, ints and longs. No matter the input type, we always aggregate into doubles, so we don't have to support those types in raw or processed vectors. Support writing MEX and tabular format from those vectors. Support loading integer data from MEX and all the supported types from AnnData. Add an option to prefer single-precision when loading data vectors, which might imply losing some precision. Add an option to use double precision for MEX. The default is single-precision now.
We only support one matrix format for single cell data, so vectors that are not stored in double require conversion.
5840b71
to
fc6cbe5
Compare
…and data The COUNT_FAST aggregation method does not even need the data to be populated, so we can nearly double the throughput by omitting it.
Data in GEO is always retrieved as string arrays of known size, so replace all the List<Object> with String[]. Move the logic for parsing arrays of strings to QuantitationTypeConversionUtils.
Those values should be handled gracefully and without producing a warning. Move conversion logic back in GeoConverterImpl since this is meant to be tailored to data encountered in GEO.
Enforce dependency convergence now that it has been achieved. Remove unused jboss-3jb3x dependency
…g elements() A DAL can have more elements than actual values, so its length must always be taken from size(), not elements().length. This is only problematic if the array was created by calling add().
…ompressedStringListType
TODO
SingleCellExpresionDataMatrix
, we need to finish the work and write it to file). I think MEX is a pretty decent output format for this.REST API