Skip to content

QUESTION: the p-value for multiple partitions of a data set #168

@jingxinfu

Description

@jingxinfu

Hi,
I was trying to find DEGs between two conditions while controlling the sample-driven effect. Following the tutorial, I used this script to conduct my analysis.

part = de.test.partition(
    data=data_part,
    parts="sample"
)
test_part = part.wald(
    formula_loc="~ 1 + condition",
    factor_loc_totest="condition"
)

Next, I was checking how diffxpy combine p-values from different groups and found this:

        res = pd.DataFrame({
            "gene": self.gene_ids,
            # return minimal pval by gene:
            "pval": np.min(self.pval.reshape(-1, self.pval.shape[-1]), axis=0),
            # return minimal qval by gene:
            "qval": np.min(self.qval.reshape(-1, self.qval.shape[-1]), axis=0),
            # return maximal logFC by gene:
            "log2fc": np.asarray(logfc),
            # return mean expression across all groups by gene:
            "mean": np.asarray(self.mean)
        })

        return res

Would you mind kindly telling me why to choose the minimum p value across groups?
I was wondering that it might increase the amount of significant genes in this way.
Would other methods, like fisher method https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.combine_pvalues.html, be better?

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions