Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding log2fc returned by differential expression Wald testing #198

Open
YousufYassouf opened this issue Apr 26, 2021 · 1 comment

Comments

@YousufYassouf
Copy link

Hi,
I was trying to find DEGs and their fold change between two conditions (Conditions =['1Con', '2Con']) and to regress out two covariates (SampleType =['Frozen', 'Fresh'], n_genes).

I have two questions:

  1. How to interpret the resulting log2fc, is it 1Con/2Con or the 2Con/1Con?
  2. 'caught 141 linalg singular matrix errors' appeared in the output, but I still got the summary table, is it correct or should I do something about it?

Any help would be appreciated!

I used this script to perform the analysis:

test = de.test.wald(data=fadata,
formula_loc="~1+Conditions+SampleType+SampleType:Conditions+n_genes+n_genes:Conditions",
                    factor_loc_totest="Conditions", as_numeric=['n_genes'],
                    sample_description=fadata.obs, noise_model="nb", training_strategy="DEFAULT", dtype="float64")
test.summary().to_csv(sc.settings.figdir /'de.test.wald.summary.tsv', sep="\t", index=False)

I got this output:

Diffxpy de.test.wald testing
training location model: True
training scale model: True
iter   0: ll=8489257375.931549
iter   1: ll=8486685806.088095, converged: 0.00% (loc: 4.67%, scale update: False), in 26.07sec
caught 141 linalg singular matrix errors
iter   2: ll=8486679830.490733, converged: 0.00% (loc: 98.58%, scale update: False), in 24.84sec
iter   3: ll=8486678948.803598, converged: 0.00% (loc: 98.58%, scale update: False), in 4.52sec
iter   4: ll=8486678815.294401, converged: 0.00% (loc: 98.65%, scale update: False), in 4.52sec
iter   5: ll=8486678785.821270, converged: 0.00% (loc: 98.88%, scale update: False), in 4.45sec

Fitting 24506 dispersion models: (progress not available with multiprocessing)
iter   6: ll=70569325.724172, converged: 3.62% (loc: 3.62%, scale update: True), in 1788.10sec
caught 123 linalg singular matrix errors
iter   7: ll=70296497.765427, converged: 3.62% (loc: 35.44%, scale update: False), in 23.67sec
iter   8: ll=70260749.227150, converged: 3.62% (loc: 40.07%, scale update: False), in 18.26sec
iter   9: ll=70251928.577203, converged: 3.62% (loc: 58.57%, scale update: False), in 17.01sec
iter  10: ll=70249711.995841, converged: 3.62% (loc: 81.31%, scale update: False), in 12.83sec
iter  11: ll=70249379.829412, converged: 3.62% (loc: 89.92%, scale update: False), in 7.47sec

Fitting 23619 dispersion models: (progress not available with multiprocessing)
iter  12: ll=54446548.515894, converged: 20.45% (loc: 20.45%, scale update: True), in 1559.48sec
caught 46 linalg singular matrix errors
iter  13: ll=54435626.591855, converged: 20.45% (loc: 87.07%, scale update: False), in 20.72sec
iter  14: ll=54434658.754988, converged: 20.45% (loc: 94.59%, scale update: False), in 6.43sec
iter  15: ll=54434444.213969, converged: 20.45% (loc: 96.54%, scale update: False), in 5.81sec
iter  16: ll=54434378.910931, converged: 20.45% (loc: 97.34%, scale update: False), in 5.26sec
iter  17: ll=54434362.876335, converged: 20.45% (loc: 97.69%, scale update: False), in 5.24sec

Fitting 19494 dispersion models: (progress not available with multiprocessing)
iter  18: ll=48554309.314322, converged: 67.49% (loc: 67.49%, scale update: True), in 1260.92sec
caught 12 linalg singular matrix errors
iter  19: ll=48554130.906475, converged: 67.49% (loc: 97.40%, scale update: False), in 11.38sec
iter  20: ll=48554108.553771, converged: 67.49% (loc: 98.90%, scale update: False), in 5.42sec
caught 11 linalg singular matrix errors
iter  21: ll=48554100.295546, converged: 67.49% (loc: 99.10%, scale update: False), in 3.43sec
caught 44 linalg singular matrix errors
iter  22: ll=48554096.867598, converged: 67.49% (loc: 99.37%, scale update: False), in 3.20sec
caught 27 linalg singular matrix errors
iter  23: ll=48554095.283522, converged: 67.49% (loc: 99.52%, scale update: False), in 2.71sec

Fitting 7967 dispersion models: (progress not available with multiprocessing)
iter  24: ll=46569302.387164, converged: 90.46% (loc: 90.46%, scale update: True), in 527.66sec
caught 116 linalg singular matrix errors
iter  25: ll=46569299.973292, converged: 90.46% (loc: 99.49%, scale update: False), in 6.18sec
caught 11 linalg singular matrix errors
iter  26: ll=46569299.754370, converged: 90.46% (loc: 99.87%, scale update: False), in 2.71sec
caught 6 linalg singular matrix errors
iter  27: ll=46569299.752718, converged: 90.46% (loc: 99.92%, scale update: False), in 1.41sec
iter  28: ll=46569299.752676, converged: 90.46% (loc: 99.93%, scale update: False), in 1.50sec
caught 3 linalg singular matrix errors
iter  29: ll=46569299.752662, converged: 90.46% (loc: 99.96%, scale update: False), in 1.19sec

Fitting 2337 dispersion models: (progress not available with multiprocessing)
iter  30: ll=46113421.509132, converged: 97.82% (loc: 97.82%, scale update: True), in 175.80sec
caught 30 linalg singular matrix errors
iter  31: ll=46113419.735521, converged: 97.82% (loc: 99.93%, scale update: False), in 5.41sec
caught 6 linalg singular matrix errors
iter  32: ll=46113419.712236, converged: 97.82% (loc: 100.00%, scale update: False), in 1.23sec
iter  33: ll=46113419.710071, converged: 97.82% (loc: 100.00%, scale update: False), in 1.25sec
iter  34: ll=46113419.709985, converged: 97.82% (loc: 100.00%, scale update: False), in 1.01sec
iter  35: ll=46113419.709984, converged: 97.82% (loc: 100.00%, scale update: False), in 1.02sec

Fitting 534 dispersion models: (progress not available with multiprocessing)
iter  36: ll=45997409.837720, converged: 99.64% (loc: 99.64%, scale update: True), in 57.52sec
caught 12 linalg singular matrix errors
iter  37: ll=45997409.834046, converged: 99.64% (loc: 99.98%, scale update: False), in 2.16sec
iter  38: ll=45997409.834046, converged: 99.64% (loc: 100.00%, scale update: False), in 1.53sec

Fitting 89 dispersion models: (progress not available with multiprocessing)
iter  39: ll=45989396.952243, converged: 99.92% (loc: 99.92%, scale update: True), in 32.57sec
caught 1 linalg singular matrix errors
iter  40: ll=45989396.952202, converged: 99.92% (loc: 100.00%, scale update: False), in 1.58sec
iter  41: ll=45989396.952202, converged: 99.92% (loc: 100.00%, scale update: False), in 1.13sec

Fitting 20 dispersion models: (progress not available with multiprocessing)
iter  42: ll=45950000.848035, converged: 99.97% (loc: 99.97%, scale update: True), in 28.84sec
iter  43: ll=45950000.848035, converged: 99.97% (loc: 100.00%, scale update: False), in 1.55sec

Fitting 7 dispersion models: (progress not available with multiprocessing)
iter  44: ll=45949070.365447, converged: 100.00% (loc: 100.00%, scale update: True), in 24.72sec
iter  45: ll=45949070.365447, converged: 100.00% (loc: 100.00%, scale update: False), in 1.35sec

Fitting dispersion models: 0.00% in 0.01sec
iter  46: ll=45949070.365447, converged: 100.00% (loc: 100.00%, scale update: True), in 1.25sec

and this is the head of the resulting tsv file

gene	pval	qval	log2fc	mean	zero_mean	grad	coef_mle	coef_sd	ll
A1BG	0.312478681	0.787738151	0.161516688	0.092773702	FALSE	4.056426074	0.161516688	0.159912024	-6781.629286
A1BG-AS1	0.48030134	0.929602236	0.331335826	0.009863314	FALSE	0.009596474	0.331335826	0.469435231	-385.7717783
A1CF	0.894546706	1	-0.21362873	0.001293466	FALSE	0.082358417	-0.21362873	1.611644585	-82.14013579
A2M	0.047744881	0.275626869	-0.763245681	0.01917489	FALSE	0.262260202	-0.763245681	0.385549251	-1230.818608
A2M-AS1	0.524215312	0.957188022	-0.469050349	0.005311979	FALSE	0.000410546	-0.469050349	0.736503355	-560.8241553
A2ML1	0.646698854	1	-0.415191149	0.003174966	FALSE	6.79E-06	-0.415191149	0.905832734	-467.2889809
A2ML1-AS2	1	1	0	0	TRUE	5.03E-127	0	3.37E+63	-7.31E-126
A4GALT	0.21076328	0.658880588	-1.394940411	0.001117579	FALSE	1.47615582	-1.394940411	1.114642066	-27.84762266
AAAS	0.901200251	1	0.024925929	0.046980111	FALSE	11.23234409	0.024925929	0.200780325	-1885.289258
@YousufYassouf
Copy link
Author

Edit: I have used the following script

de.utils.preview_coef_names(sample_description=fadata.obs,
                                  formula="~1+Conditions+SampleType+SampleType:Conditions+n_genes+n_genes:Conditions",
                                  as_numeric='n_genes')

and got this output:

['Intercept', 'Conditions[T.2Con]', 'SampleType[T.frozen PBMC]', 'SampleType[T.frozen PBMC]:Conditions[T.2Con]', 'n_genes', 'n_genes:Conditions[T.2Con]']

it seems that '1Con' is in the intercept and the fold change is estimated as 2Con/1Con, so is there a simple way to choose which coefficient from the 'Conditions' will be in the intercept (maybe like selecting reference) to interpret log2c more easily?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant