Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue -inf Likelihood with any bmodels/rmodels #5

Open
MaximePolicarpo opened this issue Nov 7, 2020 · 9 comments
Open

Issue -inf Likelihood with any bmodels/rmodels #5

MaximePolicarpo opened this issue Nov 7, 2020 · 9 comments

Comments

@MaximePolicarpo
Copy link

Hi,

I am currently trying to run BadiRate but I always end up with the same error :

The command line always return the message : "WARN: Try using a more complex model, or changing the starting values. See the -rmodel, -bmodel and -start_val options" .

I tried to modify the -rmodel, the -bmodel, the -ep option and the -start_val value but it print this WARN every time and my output file always look like this :

OUTPUT

##Family Turnover Rates
	#Likelihood: -inf

##Execution time (seconds): 562

END OUTPUT

I attached two files : My species tree and the table with the gene count per OG for each species.
And here is an example of a command line I tried :

perl BadiRate.pl -anc -treefile Species_Tree_rooted_ultrametric_newick.txt -sizefile table_OGs_protein_counts.txt –bmodel FR –ep CMAP -rmodel BDI –family -unobs -start_val 1 –root_dist 0 > output.bd

Would anyone know why I never get any results ?

Thanks for any help provided,

Maxime Policarpo
table_OGs_protein_counts.txt
Species_Tree_rooted_ultrametric_newick.txt

@fgvieira
Copy link
Owner

fgvieira commented Nov 9, 2020

Hi Maxime,

from what you are telling me, it seems that your data is too heterogeneous for the model you are using. That is, you have gene families with very different rates and no model can explain them. Usually, the way to go is to use more complex models but, sometimes, no model can fit the data. This can happen, for example, if you have transposable elements in your dataset, as these have tipically much higher rates than other genes.

I'd remove all transposable element families from the dataset before running BadiRate. If you are very interested in them, then I'd try spliting the dataset in two and analyze them separately.

@MaximePolicarpo
Copy link
Author

Hi Filipe,

My dataset is only composed by olfactory receptor, than indeed have very different rates of duplications/losses across my species. Would you suggest to split the species tree dataset or split the count table (For example if I have 30 Ortholog groups, perform an analysis with the first 15 OG and a second analysis with the 15 others ?)

Thanks a lot for your help,

Maxime

@fgvieira
Copy link
Owner

fgvieira commented Nov 9, 2020

Have you tried a FR model?

@MaximePolicarpo
Copy link
Author

MaximePolicarpo commented Nov 9, 2020

Yes, I almost tried every combinations of -bmodel and -rmodel possible :(

@MaximePolicarpo
Copy link
Author

In fact, even when I try with only 1 sub-family of my file (OG_1), I always end up with the same warning. I tried to launch BadiRate with the same species tree but with another dataset, and this time It worked fine. I don't really know how to handle this problem ..

@fgvieira
Copy link
Owner

Did you try with only one family (e.g. OG_1) but just using parsimony?

@MaximePolicarpo
Copy link
Author

Yes, I tried :

/home/casane/perl5/perlbrew/perls/perl-5.16.0/bin/perl /home/casane/badirate-1.35/BadiRate.pl -anc -treefile Species_Tree_rooted_ultrametric_newick.txt -sizefile table_OGs_protein_counts_OG1.txt –bmodel FR –ep CWP -rmodel BDI -start_val 1 –root_dist 0 > output.bd

/home/casane/perl5/perlbrew/perls/perl-5.16.0/bin/perl /home/casane/badirate-1.35/BadiRate.pl -anc -treefile Species_Tree_rooted_ultrametric_newick.txt -sizefile table_OGs_protein_counts_OG1.txt –bmodel FR –ep CWP -rmodel BDI -start_val 0 –root_dist 0 > output.bd

Still the same warning message :/

@YiyanYang0728
Copy link

YiyanYang0728 commented Jun 19, 2021

I also encountered this issue. And the gene family in my dataset is not transposable elements-related. I tried all the combinations with ep=ML and ended up with this same warning (even FR did not work).
At last, I modified the original code (Badirate.pl) on line 1894-1895 in the function for BDI model, like this:

1894         #$error=2;
1895         return -$y; # "-inf";

Then I got some results but with Likelihood=-1e+26.

I want to ask if this result is usable or not. Is it possible to give a prior file which is able to fit highly heterogeneous data? I'd really appreciate your help and suggestion.

Thanks,
Yiyan

@fgvieira
Copy link
Owner

@YiyanYang0728 not sure if you have been contacted already, but I'd suggest contacting the main author directly (Pablo Librado), as stated in the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants