-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Maker annotation workflow #47
Add Maker annotation workflow #47
Conversation
ping @abretaud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool, thanks for working on this!
@@ -0,0 +1,7 @@ | |||
version: 1.2 | |||
workflows: | |||
- name: "ANNOTATION-OF-EUKARYOTIC-GENOMES-WITH-MAKER" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to have this name in upper case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, except that we named all the covid-19 workflows in uppercase, but there's no technical reason to do it. If there's only a single workflow in the folder we could also just call it main
? The only concern is that the name should be reasonably short (if you need to type it manually, or include it as a reference in a paper), and if we add multiple workflows we should know which workflow does what.
I will have a look at the test error, that looks like a Galaxy bug |
Thanks! |
OK, so the busco task is failing, and the test framework then fails trying to download the outputs.
This runs for about 4 hours and then fails with
Unfortunately we don't have the stderr, but I wouldn't be surprised if it fails for CPU or memory limits. The job itself ran for almost 4 hours ... would reducing the input size accelerate this job ? Can we maybe use the tool test data for busco ? |
Yes, I guess that it should be a memory or cpu issue, because I tested it in Galaxy and worked fine. I'll reduce the size of the input files. |
HI @mvdbeek, now it triggers this error: |
It's maker that is failing now:
I have updated planemo to produce more meaningful output, let's see what's going on there |
Also maker is on the biocontainer skip list: https://github.com/galaxyproject/tools-iuc/blob/master/.tt_biocontainer_skip#L1, that might be a problem too |
Now it launched this error: |
You can see the actual error in the artifact uploaded in the test job: Results (powered by Planemo)Summary
Errored s
|
@gallardoalba
I would say augustus needs tar as a dependency, ideally in the Conda recipe, unless the usage of tar is within the Galaxy wrapper. |
Thanks! |
I think you also need to update the version of the conda dependency in the tool wrapper, which still uses 3.3.3 -> https://github.com/galaxyproject/tools-iuc/blob/master/tools/augustus/macros.xml#L10 |
.github/workflows/workflow_test.yml
Outdated
@@ -138,6 +138,7 @@ jobs: | |||
galaxy-branch: ${{ env.GALAXY_BRANCH }} | |||
chunk: ${{ matrix.chunk }} | |||
chunk-count: ${{ needs.setup.outputs.chunk-count }} | |||
planemo-version: 'https://github.com/galaxyproject/planemo/archive/refs/heads/master.zip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master
is a moving target. Ideally this should be changed to pin a specific release before the PR is merged.
Thanks, I'm working on it. |
I think it ran until completion, but it seems that the test framework thinks that the output is a path and not a URL, I will take a look at this. |
As alternatives, I'm working on funannotate and braker, maybe finder at some point. But I don't think they can be really interchanged: for some organisms, and depending on evidence data, one tool will be better than others... |
So what are we going to do now ? Since maker is available in bioconda and homebrew-science and there was some communication with the authors about this I tend to think we could merge this and deprecate the workflow should there be any requests that concern licensing ? What do you think about this @bwlang ? |
I guess we need to decide about 2 things: 1 is easily dealt with I think: maybe someone could contact the author and check about the seeming inconsistency between the LICENSE.txt and the web page. Last time I said something about a license problem someone got pretty mad at me on twitter - I hope that doesn't happen again... It's a bummer we didn't think about 2 before. I'm sorry to bring this up at the last minute, after people did work to build something cool and useful ;( If we think it's ok, it might be a good idea to put up some kind of warning to avoid setting a trap for a commercial galaxy user who might not think to check all the licenses of tools used in a workflow during installation. For.a second there I was like "oh cool - maker is open source now! Maybe I can use it to finally "finish" my deer genome project!" ... alas, no. Bioconda does allow academic only tools - they just require that redistribution be allowed. They jump through some hoops for older GATK to make the user does the download on installation. I think that's a good - less likely that someone would put their organization at risk of lawsuits because they did not notice an unusual license. If we think it's not ok, I hope we can meld the README and CONTRIBUTING.md (#61 ) to make a checklist to avoid disappointing contributors in the future. |
@gallardoalba could you add a note both in the readme and the description of the workflow itself that maker has an academic only license ? I think that's the best we can do for now. Also ping @galaxyproject/iwc, are y'all OK with this ? I suppose maker is in the IUC so in a sense we've already implicitly agreed that academic only is OK ? That said, we may want to also display licenses of the underlying tool in the wrapper (the license tag is now meant for the wrapper itself). If we do this we can have something semi-automatic for warning non-academic users. |
Hmm - i'm not sure there was much thought about this unusual license question when this went into IUC. I see from the history that it's been there a while, but it was switched from repeatmasker's database to dfam - probably for a similar reason. |
There's surely a contradiction with the GPL license, and the "non commercial" thing... I'd say as long as it's gpl, the rest makes no sense.. |
Yeah - that does seem contradictory to me - but I'm no expert. |
@bwlang Thanks for bringing this up! IANAL, but in my opinion:
|
Ah, I forgot:
|
Yeah, I agree the intent is also clearly "not for commercial use" |
I included some information in the README.md, but not sure how to include it in the workflow @mvdbeek |
As requested. This is Mark Yandell's official position on using MAKER for bioconda and galaxy -->
|
* Maker: add an option to warn users about license (as asked by authors in galaxyproject/iwc#47 (comment)) * lint * simpler check
Has been solved the problem with the license @abretaud? |
@gallardoalba I think so: we now have a clear statement from the authors, and a checkbox in the maker tool to force the user to acknowledge it. If everyone's ok with that I think we can move on. We only need to update the tools in the workflow before merging |
|
||
As a possible approach to assess whether changes in the workflow contribute to its improvement, one possibility is to use the [ParseVal](https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/aegean_parseval/aegean_parseval/0.16.0) tool, in order to compare the obtained result with a standard annotation. | ||
|
||
If you only want to know is an annotation looks reasonable based on the current test data, you can just count the genes in the output GFF, and/or compare the total length of genes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you only want to know is an annotation looks reasonable based on the current test data, you can just count the genes in the output GFF, and/or compare the total length of genes. | |
If you only want to know if an annotation looks reasonable based on the current test data, you can just count the genes in the output GFF, and/or compare the total length of genes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a link to https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/funannotate/tutorial.html#comparing-annotations here = different tools to compare annotation results
Co-authored-by: Anthony Bretaudeau <[email protected]>
Maybe it would be a good idea to expose the new checkbox as a workflow parameter as well? |
I’m not in favor… I think this is a trap just waiting to be sprung on an an unsuspecting junior employee to put their employer (and career) in jeopardy. It sucks that scientists have to be thinking about such things, but here we are. I can see that the checkbox is present now and galaxy could show logs in a trial saying that summer intern X checked the box and that galaxy and bioconda did not contribute to that behavior. I don’t know if that would hold up. In any case, this still seems like a bad situation to be in for galaxy folks, for the intern, for the company; for everybody but the university’s lawyers I guess. Just my opinion… take it for what it’s worth. |
I do agree with @bwlang, I think it was a mistake to add maker to the IUC in the first place. IANAL, but the statement in #47 (comment) does not seem bulletproof. Maybe it would free us from legal action against us, but who knows which third party could try to cash in without the original author's permission or intent. |
So I'll close it. I'll work in the Funnotate workflow if you agree @abretaud. Thanks all! |
Ok, no problem to close it. Licenses are really a problem for genome annotation :( |
No description provided.