Modernize standalone BLaDE runtime, prefix sums, and large-system support#10
Open
stanislc wants to merge 47 commits into
Open
Modernize standalone BLaDE runtime, prefix sums, and large-system support#10stanislc wants to merge 47 commits into
stanislc wants to merge 47 commits into
Conversation
- Added nprint parameter for controlling MINI> output frequency (default: 50) - Fixed SD damping to halve first, then scale by 2.4 on energy decrease - Added NaN/Inf energy detection with fatal error for both SD and SDFD - Added dxRMS cap at 10x initial value to prevent unbounded step growth - Added MINI> output showing step, energy, deltaE, and RMS gradient - Wrapped debug output with verbose > 1 checks for cleaner output
…ONS DIHE compatibility Dihedral restraints (originally from ChrispyYeh/BLaDE, PR RyanLeeHayes#5): - Added WIDTH field to DiRestPotential for flat-bottom harmonic potential - Based on CHARMM CDIH reference (default width=0 = pure harmonic) - Fixed bug: block field was not being copied to GPU CHARMM CONS DIHE compatibility fixes: - Periodic: changed n*phi - phi0 to n*(phi - phi0) - Harmonic: E = K*(phi-phi0)^2 without 0.5 factor (matches CHARMM) - Verified against CHARMM CONS DIHE calculated values Build system modernization: - CMakeLists.txt: Updated to CMake 3.18+ with native CUDA support
Options: - --double: Use double precision (default: float) - --gromacs: Use NMPS/GROMACS units (default: AKMA/CHARMM) - -j N, --jobs N: Parallel build jobs - --install [PREFIX]: Build and install - --reconfigure: Force CMake reconfiguration - --clean: Remove build directory - -h, --help: Show usage
- Add EElec enum (efswitch, epme, efshift) and EVdw enum (evswitch, evfswitch, evshift) - Change template dispatch from bool to int for 9 method combinations (3 elec × 3 VDW) - FSHIFT: Force-shifted electrostatics E = kqq*(1/r - 2/roff + r/roff²) - VSHIFT: Potential-shifted VDW with smooth energy cutoff - Update nbdirect.cu and pair.cu kernels with new method dispatch - Default: VFSWITCH + PME (backward compatible)
Includes reduction.h and scan.h with hierarchical Blelloch and Hillis-Steele algorithms. Uses preprocessor guards for CHARMM API compatibility (BLADE_STANDALONE for standalone builds).
Refactoring for code clarity: - Add msldEwaldType parameter to pair kernel - Explicit mode-specific lambda force calculation - Mode 0 (PMEL not specified) now works like Mode ON - Matches DOMDEC behavior in ebonded_domdec.F90
- Single Ctrl+C: sets interrupt flag for graceful stop - 3x rapid Ctrl+C: force immediate exit - API: blade_install/restore_signal_handler, blade_check_interrupt
- GPU-side detection with atomic flag - Reports first atom with invalid velocity - Helps identify simulation instabilities
- verbose=0: minimal output (default) - verbose=1: progress output (MINI> lines, setup messages) - verbose=2: debug output (detailed energy, gradient info) - blade_set_verbose() API for external control
- Remove deprecated eecats energy term - Add descriptive comments to energy enum - Add PNOE reference position fields (c0x, c0y, c0z) - Add is_pnoe flag for NOE restraint type
BLaDE declares blade_log() without implementing it. Consumers (standalone build, CHARMM) provide implementations.
Routes logging to stdout. Only compiled for standalone builds.
Use blade_log.h declaration instead. Core is now clean of platform-specific code.
When ON (default), includes blade_log.cpp for standalone builds. When OFF (submodule), consumer provides blade_log implementation.
Makes standalone build intent explicit in the cmake configuration.
Routes all output through blade_log API for CHARMM compatibility.
Memory allocation info is debug-level output.
Heuristic parameters are debug-level output, not normal progress.
PSF progress is normal-level output, shown with verbose >= 1.
- potential.cxx: Add back if (boRestCount>0), if (anRestCount>0), if (diRestCount>0) guards to prevent zero-size allocations - minimize.cu: Add stalled minimization detection with extra damping when energy is unchanged between steps - minimize.cu (SDFD): Add initial damping and dxRMS cap These fixes restore physics-correct behavior that was present in CHARMM's working version.
- Fix memory leak: add cudaFree(nanFlag_d) to State destructor - Fix NaN detection: change fprintf to fatal() to terminate on NaN/Inf - Set vdwMethod/elecMethod enums from API parameters - Add vdwmethod/elecmethod keywords for standalone input parsing - Update RUN PRINT output to show method names - Fix CMakeLists.txt comment: blade_log.cpp -> blade_log.cxx
Add per-block lambda control to the standalone BLaDE input script parser: - Allocate thetaFriction and blockFixed arrays in nblocks section - Propagate 'msld gamma' to per-block thetaFriction array - New 'msld friction <block> <value>' keyword for per-block friction - Extended 'msld fix' to accept block indices (e.g. 'msld fix 2 5') - Zero mass/velocity for fixed blocks in Msld::initialize() so the integrator skips those DOFs (isfinite(1/sqrt(0)) == false) - Add test_ffix.inp test case for RNaseH_peptide system
There was a problem hiding this comment.
Pull request overview
This PR syncs standalone BLaDE forward with reliability improvements, expanded parser/runtime controls, and multiple physics/kernel enhancements (PMEL/FSHIFT/VSHIFT, per-block friction, partial FFIX, NaN/Inf detection), along with a modernized standalone build flow.
Changes:
- Add platform-independent logging (
blade_log) and adjust verbosity behavior across setup/parser/runtime paths. - Extend dynamics/minimization/runtime robustness (Ctrl+C handling, NaN/Inf detection, minimization step-size behavior).
- Expand nonbonded / MSLD / domdec capabilities (VDW/ELEC method enums, PMEL handling, per-block friction & partial FFIX, scan helpers, dynamic partner-array growth).
Reviewed changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/ec2acc/input_test | Fix EC2ACC test input to explicitly specify PDB coordinate type. |
| test/RNaseH_peptide/blade/test_ffix.inp | Add standalone parser test for per-block friction + partial FFIX. |
| src/update/update.cu | Add NaN flagging in update kernels; per-lambda friction/noise usage. |
| src/update/rex.cxx | Route replica-exchange logs via blade_log. |
| src/update/pressure.cu | Reduce default verbosity and route pressure-coupling logs via blade_log. |
| src/update/minimize.cu | Refine minimization output/verbosity; NaN/Inf energy checks; adaptive step behavior. |
| src/system/system.cxx | Route system/help output via blade_log; update verbose help text. |
| src/system/structure.h | Extend structure APIs (PSF reader signature; NOE/DiRest signatures). |
| src/system/structure.cxx | Route structure parsing logs via blade_log; add NOE validation + PNOE params. |
| src/system/state.h | Add NaN-flag plumbing and per-lambda friction/noise pointers in LeapState. |
| src/system/state.cxx | Allocate/free NaN flag; build per-block friction/noise arrays; check NaN flag in recv_energy. |
| src/system/selections.cxx | Route selection logs via blade_log. |
| src/system/potential.h | Clarify energy term enum; extend NOE/DiRest structs (PNOE + flat-bottom width). |
| src/system/potential.cxx | Gate some logs by verbosity; guard allocations when restraint counts are zero; copy DiRest width. |
| src/system/parameters.cxx | Route parameter parsing/dump logs via blade_log; remove noisy debug print. |
| src/system/coordinates.cxx | Route coordinates help/NYI output via blade_log. |
| src/run/run.h | Add interrupt/scanalgorithm APIs; add EVdw/EElec enums; add minimization print frequency. |
| src/run/run.cu | Implement Ctrl+C flagging + public API; add scanalgorithm plumbing; route run output via blade_log. |
| src/restrain/restrain.cu | Add PNOE handling and r=0 guard in NOE; implement flat-bottom torsion width for DiRest. |
| src/nbdirect/nbdirect.cu | Replace boolean switching with VDW/ELEC method dispatch (3×3 combinations); add FSHIFT + VSHIFT support. |
| src/msld/msld.h | Add per-block theta friction and per-block fixed flags; add C API for both. |
| src/msld/msld.cu | Implement per-block friction/fixed blocks; partial FFIX parsing; update PMEL pair filtering; bias term mapping. |
| src/main/scan.h | Add multi-block scan utilities with selectable algorithm and workspace management. |
| src/main/reduction.h | Add multi-block reduction utilities for real/real3/real33 with workspace support. |
| src/main/defines.h | Add make_real3 helper macro for float/double builds. |
| src/main/blade_log.h | Introduce platform-independent logging API declaration (blade_log). |
| src/main/blade_log.cxx | Provide standalone blade_log implementation (stdout/fflush). |
| src/io/variables.cxx | Route variables output/debug through blade_log. |
| src/io/io.cxx | Route interpreter/energy display/fatal/file-open logging through blade_log. |
| src/io/control.cxx | Route if/while control-flow logs through blade_log. |
| src/domdec/domdec.h | Add overflow flag and partner-array reallocation hook for domdec culling. |
| src/domdec/domdec.cu | Allocate/free overflow flag; gate domdec diagnostic logs by verbosity. |
| src/domdec/cull.cu | Add overflow detection + retry growth for partner arrays; add reallocation helper. |
| src/domdec/assign_excl.cu | Route exclusion reallocation note via blade_log. |
| src/domdec/assign_blocks.cu | Add large-ndiv multi-block scan-based block-bound assignment path + scan benchmarking. |
| src/bonded/pair.cu | Extend pair kernels to method enums; add FSHIFT support; PMEL mode-specific nbex scaling/forces. |
| src/CMakeLists.txt | Modernize CMake/CUDA toolchain discovery; add BLADE_STANDALONE option; set CUDA arch list; link NVTX if available. |
| doc/MAIN.txt | Document partial FFIX form and run setvariable scanalgorithm. |
| Compile.sh | Replace environment-module build flow with portable CMake build script + options. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This draft PR collects the standalone BLaDE updates from
stanislc/BLaDEthat are ahead ofRyanLeeHayes/BLaDE:master.Main changes include:
BLADE_STANDALONEandCompile.shoptions for precision, units, install, clean, and reconfigureblade_logand verbosity controls, then route setup/runtime output through the logging APIMINI>progress outputvdwmethod/elecmethodcontrolsValidation
rtk ./Compile.sh -j 8rtk git diff --check upstream/master..HEAD6727200ongpu2080/gollum154: completed with exit0:06727201ongpu2080/gollum154: completed with exit0:0Notes
This PR intentionally targets only the standalone BLaDE repository. CHARMM integration-specific differences are not included here.