QUESTION: the p-value for multiple partitions of a data set

Hi,
I was trying to find DEGs between two conditions while controlling the sample-driven effect. Following the tutorial, I used this script to conduct my analysis.
```
part = de.test.partition(
    data=data_part,
    parts="sample"
)
test_part = part.wald(
    formula_loc="~ 1 + condition",
    factor_loc_totest="condition"
)
```
Next, I was checking how `diffxpy` combine p-values from different groups and found this:
```
        res = pd.DataFrame({
            "gene": self.gene_ids,
            # return minimal pval by gene:
            "pval": np.min(self.pval.reshape(-1, self.pval.shape[-1]), axis=0),
            # return minimal qval by gene:
            "qval": np.min(self.qval.reshape(-1, self.qval.shape[-1]), axis=0),
            # return maximal logFC by gene:
            "log2fc": np.asarray(logfc),
            # return mean expression across all groups by gene:
            "mean": np.asarray(self.mean)
        })

        return res
```
Would you mind kindly telling me why to choose the minimum p value across groups?
I was wondering that it might increase the amount of significant genes in this way.
Would other methods, like `fisher method` https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.combine_pvalues.html, be better?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QUESTION: the p-value for multiple partitions of a data set #168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QUESTION: the p-value for multiple partitions of a data set #168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions