-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Labels
Description
Hi,
I was trying to find DEGs between two conditions while controlling the sample-driven effect. Following the tutorial, I used this script to conduct my analysis.
part = de.test.partition(
data=data_part,
parts="sample"
)
test_part = part.wald(
formula_loc="~ 1 + condition",
factor_loc_totest="condition"
)
Next, I was checking how diffxpy
combine p-values from different groups and found this:
res = pd.DataFrame({
"gene": self.gene_ids,
# return minimal pval by gene:
"pval": np.min(self.pval.reshape(-1, self.pval.shape[-1]), axis=0),
# return minimal qval by gene:
"qval": np.min(self.qval.reshape(-1, self.qval.shape[-1]), axis=0),
# return maximal logFC by gene:
"log2fc": np.asarray(logfc),
# return mean expression across all groups by gene:
"mean": np.asarray(self.mean)
})
return res
Would you mind kindly telling me why to choose the minimum p value across groups?
I was wondering that it might increase the amount of significant genes in this way.
Would other methods, like fisher method
https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.combine_pvalues.html, be better?