Objective: Train a non-parametric (latent-based) + parametric (MLP) model to predict species relative abundance from metabarcoding data
src/: Core architecture and training codetrain.py: Main entrypoint for training and evaluationconfig.py: Default experiment configurationdataset.py: Dataset loadingmodel.py: Model architecture definitions (latent and MLP modules)mlp.py: MLP-specific architecturesloss.py: Losses function definitions (cross-entropy and logistic)latent_solver.py: Non-parametric latent solver implementation (precompute interpolation operators, optimize latent representations, etc.)neighbor_graph.py: Build neighbour lists and compute interpolation weights + handle barcode embedding when applicablegating_functions.py: Define a variety of gating functions to combine latent and MLP predictions (when both are vectors)utils.py: Data loading and preprocessing utilities
analysis/: Experiment variants, cluster launch scripts, and visualization helpers- one subdirectory per experiment variant (e.g.
baselines/,ablation/, etc.) submit_subanalysis.sh: Unified SLURM launcherLAUNCHING.md: Detailed cluster usagevisualize_results.py: Plotting and result analysis scripts
- one subdirectory per experiment variant (e.g.
data/: Input data files and EDA notebooksautoresearch/: Adaptation of the AutoResearch framework for this project, including training loop, agent instructions, and experiment management utilities
Run a baseline training job from the Metabarcoding/ directory:
python src/train.py --model baselineRun any other model variant:
python src/train.py --model <variant_name>Resume the latest checkpoint for a variant:
python src/train.py --model <variant_name> --resumeFrom Metabarcoding/analysis:
./submit_subanalysis.sh --list-targets
./submit_subanalysis.sh --target location_embeddingFor full cluster workflow details, see analysis/LAUNCHING.md.