Skip to content

Commit

Permalink
Add Workflow Run RO-crate format (#39)
Browse files Browse the repository at this point in the history
Signed-off-by: fbartusch <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Co-authored-by: fbartusch <[email protected]>
Co-authored-by: Ben Sherman <[email protected]>
  • Loading branch information
3 people authored Feb 6, 2025
1 parent 9b5e7bb commit 4bbf0c1
Show file tree
Hide file tree
Showing 7 changed files with 1,105 additions and 22 deletions.
28 changes: 16 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ prov {
}
```

Finally, run your Nextflow pipeline. You do not need to modify your pipeline script in order to use the `nf-prov` plugin. The plugin will automatically generate a JSON file with provenance information.
Finally, run your Nextflow pipeline. You do not need to modify your pipeline script in order to use the `nf-prov` plugin. The plugin will automatically produce the specified provenance reports at the end of the workflow run.

## Configuration

Expand All @@ -44,14 +44,16 @@ Create the provenance report (default: `true` if plugin is loaded).

Configuration scope for the desired output formats. The following formats are available:

- `bco`: Render a [BioCompute Object](https://biocomputeobject.org/). Supports the `file` and `overwrite` options.

*New in version 1.3.0*: additional "pass-through" options are available for BCO fields that can't be inferred from the pipeline. See [BCO.md](./BCO.md) for more information.
- `bco`: Render a [BioCompute Object](https://biocomputeobject.org/). Supports the `file` and `overwrite` options. See [BCO.md](./BCO.md) for more information about the additional config options for BCO.

- `dag`: Render the task graph as a Mermaid diagram embedded in an HTML document. Supports the `file` and `overwrite` options.

- `legacy`: Render the legacy format originally defined in this plugin (default). Supports the `file` and `overwrite` options.

*New in version 1.4.0*

- `wrroc`: Render a [Workflow Run RO-Crate](https://www.researchobject.org/workflow-run-crate/). Includes all three profiles (Process, Workflow, and Provenance). See [WRROC.md](./WRROC.md) for more information about the additional config options for WRROC.

Any number of formats can be specified, for example:

```groovy
Expand All @@ -69,6 +71,8 @@ prov {
}
```

See [nextflow.config](./nextflow.config) for a full example of each provenance format.

`prov.patterns`

List of file patterns to include in the provenance report, from the set of published files. By default, all published files are included.
Expand Down Expand Up @@ -114,16 +118,16 @@ Following these step to package, upload and publish the plugin:

2. Update the `Plugin-Version` field in the following file with the release version:

```bash
plugins/nf-prov/src/resources/META-INF/MANIFEST.MF
```
```bash
plugins/nf-prov/src/resources/META-INF/MANIFEST.MF
```

3. Run the following command to package and upload the plugin in the GitHub project releases page:

```bash
./gradlew :plugins:nf-prov:upload
```
```bash
./gradlew :plugins:nf-prov:upload
```

4. Create a pull request against the [nextflow-io/plugins](https://github.com/nextflow-io/plugins/blob/main/plugins.json)
project to make the plugin public accessible to Nextflow app.
4. Create a pull request against the [nextflow-io/plugins](https://github.com/nextflow-io/plugins/blob/main/plugins.json)
project to make the plugin public accessible to Nextflow app.

47 changes: 47 additions & 0 deletions WRROC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Additional WRROC configuration

*New in version 1.4.0*

The `wrroc` format supports additional options to configure certain aspects of the Workflow Run RO-Crate. These fields cannot be inferred automatically from the pipeline or the run, and so must be entered through the config.

The following config options are supported:

- `prov.formats.wrroc.agent.contactType`
- `prov.formats.wrroc.agent.email`
- `prov.formats.wrroc.agent.name`
- `prov.formats.wrroc.agent.orcid`
- `prov.formats.wrroc.agent.phone`
- `prov.formats.wrroc.agent.ror`
- `prov.formats.wrroc.organization.contactType`
- `prov.formats.wrroc.organization.email`
- `prov.formats.wrroc.organization.name`
- `prov.formats.wrroc.organization.phone`
- `prov.formats.wrroc.organization.ror`
- `prov.formats.wrroc.license`
- `prov.formats.wrroc.publisher`

Refer to the [WRROC User Guide](https://www.researchobject.org/workflow-run-crate/) for more information about the associated RO-Crate entities.

Here is an example config:

```groovy
prov {
formats {
wrroc {
agent {
name = "John Doe"
orcid = "https://orcid.org/0000-0000-0000-0000"
email = "[email protected]"
phone = "(0)89-99998 000"
contactType = "Researcher"
}
organization {
name = "University of XYZ"
ror = "https://ror.org/000000000"
}
license = "https://spdx.org/licenses/MIT"
publisher = "https://ror.org/000000000"
}
}
}
```
8 changes: 8 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,13 @@ prov {
file = "${params.outdir}/manifest.json"
overwrite = true
}
wrroc {
file = "${params.outdir}/ro-crate-metadata.json"
overwrite = true
}
}
}

manifest {
license = "https://spdx.org/licenses/Apache-2.0"
}
14 changes: 5 additions & 9 deletions plugins/nf-prov/src/main/nextflow/prov/PathNormalizer.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ class PathNormalizer {

private String commitId

private String launchDir

private String projectDir

private String workDir
Expand All @@ -42,14 +40,12 @@ class PathNormalizer {
repository = metadata.repository ? new URL(metadata.repository) : null
commitId = metadata.commitId
projectDir = metadata.projectDir.toUriString()
launchDir = metadata.launchDir.toUriString()
workDir = metadata.workDir.toUriString()
}

/**
* Normalize paths so that local absolute paths become
* relative paths, and local paths derived from remote URLs
* become the URLs.
* Normalize paths against the original remote URL, or
* work directory, where appropriate.
*
* @param path
*/
Expand All @@ -66,9 +62,9 @@ class PathNormalizer {
if( repository && path.startsWith(projectDir) )
return getProjectSourceUrl(path)

// replace launch directory with relative path
if( path.startsWith(launchDir) )
return path.replace(launchDir + '/', '')
// encode local absolute paths as file URLs
if( path.startsWith('/') )
return 'file://' + path

return path
}
Expand Down
32 changes: 32 additions & 0 deletions plugins/nf-prov/src/main/nextflow/prov/ProvHelper.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package nextflow.prov
import java.nio.file.Path

import groovy.transform.CompileStatic
import nextflow.Session
import nextflow.exception.AbortOperationException
import nextflow.file.FileHelper
import nextflow.processor.TaskRun
Expand Down Expand Up @@ -49,6 +50,15 @@ class ProvHelper {
}
}

/**
* Get the remote file staging directory for a workflow run.
*
* @param session
*/
static Path getStageDir(Session session) {
return session.workDir.resolve("stage-${session.uniqueId}")
}

/**
* Get the list of output files for a task.
*
Expand Down Expand Up @@ -98,4 +108,26 @@ class ProvHelper {
return result
}

/**
* Determine whether a task input file was staged into the work directory.
*
* @param source
* @param session
*/
static boolean isStagedInput(Path source, Session session) {
return source.startsWith(getStageDir(session))
}

/**
* Determine whether a task input file was created in the work/tmp/
* directory (i.e. by a collectFile operator).
*
* @param source
* @param session
*/
static boolean isTmpInput(Path source, Session session) {
final tmpDir = session.workDir.resolve('tmp')
return source.startsWith(tmpDir)
}

}
5 changes: 4 additions & 1 deletion plugins/nf-prov/src/main/nextflow/prov/ProvObserver.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ import nextflow.trace.TraceRecord
@CompileStatic
class ProvObserver implements TraceObserver {

public static final List<String> VALID_FORMATS = ['bco', 'dag', 'legacy']
public static final List<String> VALID_FORMATS = ['bco', 'dag', 'legacy', 'wrroc']

private Session session

Expand Down Expand Up @@ -71,6 +71,9 @@ class ProvObserver implements TraceObserver {
if( name == 'legacy' )
return new LegacyRenderer(opts)

if( name == 'wrroc' )
return new WrrocRenderer(opts)

throw new IllegalArgumentException("Invalid provenance format -- valid formats are ${VALID_FORMATS.join(', ')}")
}

Expand Down
Loading

0 comments on commit 4bbf0c1

Please sign in to comment.