-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Workflow Run RO-crate format #39
Conversation
add encodingFormat for nextflow.config
feat: add wrroc to valid formats
* fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]>
* fix #7 Signed-off-by: fbartusch <[email protected]>
* feat: add README to create * feat: ignore vscode * fix: make getIntermediateOutputFiles work again (#18) (#19) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to json * feat: check first if readme exists * Add readme to hasPart Signed-off-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]>
* Add getEncodingFormat function that return the encoding format for a file * handle YAML files manually Signed-off-by: fbartusch <[email protected]>
* implements #1 Signed-off-by: fbartusch <[email protected]>
Iss7 directory type
* start with metaYaml imports * merge dev-wrroc into metaYaml (#23) * add encodingFormat for nextflow.config * add encodingFormat for main.nf * feat: add wrroc to valid formats * fix: make getIntermediateOutputFiles work again (#18) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to crate (#14) * feat: add README to create * feat: ignore vscode * fix: make getIntermediateOutputFiles work again (#18) (#19) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to json * feat: check first if readme exists * Add readme to hasPart Signed-off-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]> * WIP * only add from meta if meta exists * remove usage from ext args * add module name to id --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
ro-crate-metadata.json |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
I'm currently running tests against nf-core pipelines with some scripts I wrote for testing plugins. |
I've tried again (with the current version of the plugin) to run Famke's pipeline locally:
with local files in
The resulting crate is not even readable by ro-crate-py because it has absolute ids in it (
I see three possible ways to fix this:
However, with options 2 and 3 the crate consumer has no way to reconstruct the two input files. |
Most of the nf-core pipelines are currently failing with the plugin :(
I'm checking now why it fails and use the nf-core bamtofastq pipeline, as it is the fastest (and most simple?) pipeline that fails. That should make debugging easier. |
@simleo this is why I recommend using the original HTTP URLs:
But it is unavoidable that some users will be using local input files and we'll need to handle that gracefully. As a first iteration I'm inclined to warn about such input files and maybe make them CreativeWork if they aren't included in the crate. I will try a few things |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@simleo I ended up taking the absolute URI approach. That made the resulting crate valid. We can encourage the use of remote URIs as a best practice. In summary, only input files that are (1) specified directly by a param, (2) local, and (3) not a directory, will be copied into the crate. All of these restrictions are designed to prevent explosive data transfers from directories, remote data, and file globs. @fbartusch I ran bamtofastq with test profile and it succeeded. Let me know how the rest of your tests go with the latest revision |
@bentsherman bamtofastq looks indeed good and the validator is happy. I'm running now the tests for the other nf-core pipelines. |
@bentsherman Only one pipeline out of 42 I ran fails because of the plugin:
But all others didn't pass the validator (I used the latest commit fa8c6c7, not the PyPI release). I think this is the validator version with the least number of remaining bugs, right @simleo ? Although the list looks very long at first glance these seem to be just corner cases.
All of these messages relate to files in the temporary directory
Example: Thanks to the saved effective
Also an edge case in handling null parameter values?
Example: It looks like this:
The effective configuration during runtime is: One last thing regarding the license. |
That's the current development version, so good choice 👍
Workflow RO-Crate says:
where the first appearance of "Crate" here means the root data entity. See also Licensing, Access control and copyright. |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
This is most likely coming from a collectFile operator, which saves file outputs to work/tmp. If a downstream task uses the collected file then it will show up as a task input. I have added some logic to handle it but haven't tested
I have added some logic to exclude parameters set to null.
I'm surprised the JSON serializer did this instead of just failing. Not sure how it came up with this result. I added some logic to treat durations and memory units as raw numbers.
You're right. I added the license back to the wrroc config options. @fbartusch these fixes should remove most or all of the validation errors you saw |
@bentsherman Indeed, the plugin produces now valid RO-Crates for most of the nf-core pipelines. Great work!
Four other pipelines have missing payloads. These files are in
|
@fbartusch since we're getting into more obscure errors and this PR is a massive work, I'm going to go ahead and merge it and cut a release. Let's pursue these two issues separately |
@bentsherman testing again with https://github.com/famosab/wrrocmetatest, I've removed the
Note that other parameters from |
@simleo I ran with these settings in the prov {
enabled = true
formats {
wrroc {
file = "${params.outdir}/ro-crate-metadata.json"
overwrite = true
license = "https://spdx.org/licenses/MIT"
}
}
} That will set the license for the crate. Setting |
@bentsherman I see the license again now. I reinstalled the plugin, something must have gone wrong with that before. Sorry for the noise. |
Thanks so much @bentsherman @simleo @fbartusch for putting in all the work to finish this massive PR 🚀 |
Great job! I'm already telling customers about it... |
We worked on a first version of the plugin which is able to render valid RO-crates for any workflow run.
Happy to receive feedback to get this finished up :)
Continues #19 and #33.