nf-core · nourmahfel · Sep 15, 2025 · Sep 15, 2025 · Sep 15, 2025 · Sep 15, 2025
diff --git a/README.md b/README.md
@@ -1,38 +1,38 @@
-# ![nfcore/test-datasets](docs/images/test-datasets_logo.png)
-Test data to be used for automated testing with the nf-core pipelines
+# Rare Disease Test Datasets  
 
-> ⚠️ **Do not merge your test data to `master`! Each pipeline has a dedicated branch (and a special one for modules)**
+This repository contains subsampled long-read sequencing datasets tailored for rare disease analysis.  
 
-## Introduction
 
-nf-core is a collection of high quality Nextflow pipelines. This repository contains various files for CI and unit testing of nf-core pipelines and infrastructure.
+---
 
-The principle for nf-core test data is as small as possible, as large as necessary. Please see the [guidelines](https://nf-co.re/docs/contributing/test_data_guidelines) for more detailed information. Always ask for guidance on the [nf-core slack](https://nf-co.re/join) before adding new test data.
+## Contents  
 
-## Documentation
+- `bam_pass/` – subsampled aligned BAM files for variant calling tests  
+- `spectre/` – VCF files and BED regions for whole-genome CNV testing  
+- `straglr/` – Chromosome 22 STR test regions  
+- `test.exclude.bed` – CNV test exclude regions  
+- `reference/` – reduced human genome references   
+- `samplesheet_*.csv` – metadata for pipeline test runs  
 
-nf-core/test-datasets comes with documentation in the `docs/` directory:
+---
 
-01. [Add a new  test dataset](https://github.com/nf-core/test-datasets/blob/master/docs/ADD_NEW_DATA.md)
-02. [Use an existing test dataset](https://github.com/nf-core/test-datasets/blob/master/docs/USE_EXISTING_DATA.md)
+## Sample Overview  
 
-## Downloading test data
+| Sample ID | File type   | Size (approx.) | Purpose                                      |  
+|-----------|-------------|----------------|----------------------------------------------|  
+| Test      | BAM         | ~100 MB        | End-to-end pipeline testing from alignment (minimap2) through variant analysis | 
+| Reference | FASTA / BED | <5 MB          | Subset references (Chromosome 22) for rare disease test runs                   |  
 
-Due the large number of large files in this repository for each pipeline, we highly recommend cloning only the branches you would use.
+---
 
-```bash
-git clone <url> --single-branch --branch <pipeline/modules/branch_name>
-```
-
-To subsequently clone other branches[^1]
+## Usage  
 
-```bash
-git remote set-branches --add origin [remote-branch]
-git fetch
-```
+These datasets are intended for automated testing of long-read rare disease pipeline (https://github.com/nf-core/longraredisease).  
 
-## Support
+The data in this repository will be used to test the pipeline starting from unaligned BAM files (using minimap2).  
+The associated parameters and settings to run the pipeline can be found in the **test.config** file.  
 
-For further information or help, don't hesitate to get in touch on our [Slack organisation](https://nf-co.re/join/slack) (a tool for instant messaging).
+Example run:  
 
-[^1]: From [stackoverflow](https://stackoverflow.com/a/60846265/11502856)
+```bash
+nextflow run nf-core/nanoraredx -profile test,docker
diff --git a/genome_22/genome_22.fasta b/genome_22/genome_22.fasta
diff --git a/genome_22/genome_22.fasta.fai b/genome_22/genome_22.fasta.fai
@@ -0,0 +1 @@
+chr22	40001	7	60	61
diff --git a/samplesheet.csv b/samplesheet.csv
@@ -0,0 +1,2 @@
+sample_id,bam_path,fastq_dir,aligned_bam,methyl_bam,hpo_terms
+test,https://raw.githubusercontent.com/nourmahfel/test-datasets/longraredisease/unmapped_bam/test.bam,,,,HP:0002721;HP:0002110;HP:0500093;HP:0000717;HP:0001263;HP:0001763;HP:0003298;HP:0002857;HP:0001382
diff --git a/spectre/mosdepth.regions.bed.gz b/spectre/mosdepth.regions.bed.gz
diff --git a/spectre/test_clair3_merge_output.vcf.gz b/spectre/test_clair3_merge_output.vcf.gz
diff --git a/straglr/str.test.bed b/straglr/str.test.bed
@@ -0,0 +1 @@
+chr22	45795354	45795424	ATTCT
diff --git a/test.exclude.bed b/test.exclude.bed
diff --git a/unmapped_bam/test.bam b/unmapped_bam/test.bam
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		sample_id,bam_path,fastq_dir,aligned_bam,methyl_bam,hpo_terms
		test,https://raw.githubusercontent.com/nourmahfel/test-datasets/longraredisease/unmapped_bam/test.bam,,,,HP:0002721;HP:0002110;HP:0500093;HP:0000717;HP:0001263;HP:0001763;HP:0003298;HP:0002857;HP:0001382