Skip to content

Fix: Boltz-2 cache path, pLDDT output, and RFdiffusion v3 num_designs handling#80

Open
rnaidu-seqera wants to merge 14 commits into
mainfrom
alternates_full_pipeline
Open

Fix: Boltz-2 cache path, pLDDT output, and RFdiffusion v3 num_designs handling#80
rnaidu-seqera wants to merge 14 commits into
mainfrom
alternates_full_pipeline

Conversation

@rnaidu-seqera
Copy link
Copy Markdown

Description

This PR fixes three bugs affecting the RFdiffusion v3 and Boltz-2 stages of the pipeline, all of which caused silent incorrect behaviour on Seqera Platform CE.

1. Boltz-2 cache directory ignored on Platform (modules/local/boltz2_refold.nf)

The --cache flag was hardcoded to the literal string boltz2_cache instead of using the staged path of the input cache_dir. On Platform, the staged directory name never matches boltz2_cache, so Boltz-2 silently fell back to downloading model weights on every run.

Fix: use ${cache_dir} (the Nextflow-staged path object) so the correct directory is always passed.

2. IPSAE_CALCULATE never ran (modules/local/boltz2_refold.nf)

Boltz-2 produces plddt_*.npz files alongside PAE and CIF outputs, but the output-collection loop in the script only copied CIF, PAE, confidence JSON, and affinity files. The IPSAE workflow block requires pLDDT NPZ files to build its input channel, so with none present the channel was always empty and IPSAE_CALCULATE was skipped entirely.

Fix: added a copy step for plddt*.npz files to match the existing PAE copy block.

3. num_designs and budget parameters ignored for RFdiffusion v3 (modules/local/rfdiffusion_v3_run.nf)

n_batches was set to budget instead of num_designs, so rfdiffusion only ever generated as many backbones as the downstream budget. The num_designs samplesheet column had no effect. The ranking step that selects the top budget designs already handles the filtering, so only n_batches needed to change.

Fix: set n_batches=${num_designs} so the full requested number of backbones is generated before the top budget are selected for downstream.

All three fixes were verified with an end-to-end local run using the test_design_rfdiffusion_v3 profile.

Type of change

  • Bug Fix

How Has This Been Tested?

All fixes were verified with a full end-to-end local run using the test_design_rfdiffusion_v3 profile on an NVIDIA H100 80GB GPU.

nextflow run main.nf -profile test_design_rfdiffusion_v3,docker

Test configuration (conf/test_design_rfdiffusion_v3.config):

  • 1 sample (design1_rfd), num_designs=3, budget=2, mpnn_num_seq_per_target=2
  • Target: Nipah Glycoprotein
  • boltz2_use_msa=false, boltz2_num_recycling=1, boltz2_num_diffusion=1

Verified:

  • RFDIFFUSION_V3_RUN produced 3 raw designs (_0, _1, _2) and 2 ranked PDB outputs — confirming num_designs and budget are both respected
  • BOLTZ2_REFOLD completed for all 6 sequences with pLDDT NPZ files present in the output directory
  • IPSAE_CALCULATE ran for all 6 sequences (previously always skipped)
  • PRODIGY_PREDICT and CONSOLIDATE_METRICS completed successfully
  • Total: 25 processes, all succeeded, ~5.5 min runtime

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, and made corresponding changes to the documentation
  • New and existing unit tests pass locally with my changes

rnaidu-seqera and others added 13 commits April 24, 2026 08:59
- Add --protein_design_tool param to select between boltzgen, complexa, and rfdiffusion_v3
- Add --test_design_only mode to skip downstream analysis steps
- New modules: proteina_complexa_design.nf, rfdiffusion_v3_run.nf
- New per-tool samplesheet schemas: schema_input_boltzgen.json, schema_input_complexa.json, schema_input_rfdiffusion_v3.json
- Update main.nf with multi-tool samplesheet parsing and per-tool cache handling
- Update workflows/protein_design.nf with branched design stage and unified downstream
- Update nextflow.config with complexa and rfdiffusion_v3 params
- Update nextflow_schema.json with complexa_options and rfdiffusion_v3_options groups
- Update conf/base.config with GPU labels for new design processes
- Add test configs, samplesheets, and design YAMLs for all three backends
- Update documentation across all docs/ files
- Include RFdiffusion v3 test results for reference
…esigns are generated (makes sure num_design and budget parameters are used)
@rnaidu-seqera rnaidu-seqera self-assigned this May 8, 2026
@rnaidu-seqera
Copy link
Copy Markdown
Author

rnaidu-seqera commented May 8, 2026

@FloWuenne I'm going to run on Platform to make sure config parameters are rendering correctly in the UI

@rnaidu-seqera rnaidu-seqera requested a review from FloWuenne May 8, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants