Fix: Boltz-2 cache path, pLDDT output, and RFdiffusion v3 num_designs handling#80
Open
rnaidu-seqera wants to merge 14 commits into
Open
Fix: Boltz-2 cache path, pLDDT output, and RFdiffusion v3 num_designs handling#80rnaidu-seqera wants to merge 14 commits into
rnaidu-seqera wants to merge 14 commits into
Conversation
- Add --protein_design_tool param to select between boltzgen, complexa, and rfdiffusion_v3 - Add --test_design_only mode to skip downstream analysis steps - New modules: proteina_complexa_design.nf, rfdiffusion_v3_run.nf - New per-tool samplesheet schemas: schema_input_boltzgen.json, schema_input_complexa.json, schema_input_rfdiffusion_v3.json - Update main.nf with multi-tool samplesheet parsing and per-tool cache handling - Update workflows/protein_design.nf with branched design stage and unified downstream - Update nextflow.config with complexa and rfdiffusion_v3 params - Update nextflow_schema.json with complexa_options and rfdiffusion_v3_options groups - Update conf/base.config with GPU labels for new design processes - Add test configs, samplesheets, and design YAMLs for all three backends - Update documentation across all docs/ files - Include RFdiffusion v3 test results for reference
… inputs and gpu docker config
…labs/nf-proteindesign into alternates_full_pipeline resolving merge conflicts
…latform compatibility
…esigns are generated (makes sure num_design and budget parameters are used)
Author
|
@FloWuenne I'm going to run on Platform to make sure config parameters are rendering correctly in the UI |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes three bugs affecting the RFdiffusion v3 and Boltz-2 stages of the pipeline, all of which caused silent incorrect behaviour on Seqera Platform CE.
1. Boltz-2 cache directory ignored on Platform (modules/local/boltz2_refold.nf)
The --cache flag was hardcoded to the literal string boltz2_cache instead of using the staged path of the input cache_dir. On Platform, the staged directory name never matches boltz2_cache, so Boltz-2 silently fell back to downloading model weights on every run.
Fix: use ${cache_dir} (the Nextflow-staged path object) so the correct directory is always passed.
2. IPSAE_CALCULATE never ran (modules/local/boltz2_refold.nf)
Boltz-2 produces plddt_*.npz files alongside PAE and CIF outputs, but the output-collection loop in the script only copied CIF, PAE, confidence JSON, and affinity files. The IPSAE workflow block requires pLDDT NPZ files to build its input channel, so with none present the channel was always empty and IPSAE_CALCULATE was skipped entirely.
Fix: added a copy step for plddt*.npz files to match the existing PAE copy block.
3. num_designs and budget parameters ignored for RFdiffusion v3 (modules/local/rfdiffusion_v3_run.nf)
n_batches was set to budget instead of num_designs, so rfdiffusion only ever generated as many backbones as the downstream budget. The num_designs samplesheet column had no effect. The ranking step that selects the top budget designs already handles the filtering, so only n_batches needed to change.
Fix: set n_batches=${num_designs} so the full requested number of backbones is generated before the top budget are selected for downstream.
All three fixes were verified with an end-to-end local run using the test_design_rfdiffusion_v3 profile.
Type of change
How Has This Been Tested?
All fixes were verified with a full end-to-end local run using the test_design_rfdiffusion_v3 profile on an NVIDIA H100 80GB GPU.
nextflow run main.nf -profile test_design_rfdiffusion_v3,dockerTest configuration (conf/test_design_rfdiffusion_v3.config):
Verified:
Checklist: