Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,38 @@
# ![nfcore/test-datasets](docs/images/test-datasets_logo.png)
Test data to be used for automated testing with the nf-core pipelines
# Rare Disease Test Datasets

> ⚠️ **Do not merge your test data to `master`! Each pipeline has a dedicated branch (and a special one for modules)**
This repository contains subsampled long-read sequencing datasets tailored for rare disease analysis.

## Introduction

nf-core is a collection of high quality Nextflow pipelines. This repository contains various files for CI and unit testing of nf-core pipelines and infrastructure.
---

The principle for nf-core test data is as small as possible, as large as necessary. Please see the [guidelines](https://nf-co.re/docs/contributing/test_data_guidelines) for more detailed information. Always ask for guidance on the [nf-core slack](https://nf-co.re/join) before adding new test data.
## Contents

## Documentation
- `bam_pass/` – subsampled aligned BAM files for variant calling tests
- `spectre/` – VCF files and BED regions for whole-genome CNV testing
- `straglr/` – Chromosome 22 STR test regions
- `test.exclude.bed` – CNV test exclude regions
- `reference/` – reduced human genome references
- `samplesheet_*.csv` – metadata for pipeline test runs

nf-core/test-datasets comes with documentation in the `docs/` directory:
---

01. [Add a new test dataset](https://github.com/nf-core/test-datasets/blob/master/docs/ADD_NEW_DATA.md)
02. [Use an existing test dataset](https://github.com/nf-core/test-datasets/blob/master/docs/USE_EXISTING_DATA.md)
## Sample Overview

## Downloading test data
| Sample ID | File type | Size (approx.) | Purpose |
|-----------|-------------|----------------|----------------------------------------------|
| Test | BAM | ~100 MB | End-to-end pipeline testing from alignment (minimap2) through variant analysis |
| Reference | FASTA / BED | <5 MB | Subset references (Chromosome 22) for rare disease test runs |

Due the large number of large files in this repository for each pipeline, we highly recommend cloning only the branches you would use.
---

```bash
git clone <url> --single-branch --branch <pipeline/modules/branch_name>
```

To subsequently clone other branches[^1]
## Usage

```bash
git remote set-branches --add origin [remote-branch]
git fetch
```
These datasets are intended for automated testing of long-read rare disease pipeline (https://github.com/nf-core/longraredisease).

## Support
The data in this repository will be used to test the pipeline starting from unaligned BAM files (using minimap2).
The associated parameters and settings to run the pipeline can be found in the **test.config** file.

For further information or help, don't hesitate to get in touch on our [Slack organisation](https://nf-co.re/join/slack) (a tool for instant messaging).
Example run:

[^1]: From [stackoverflow](https://stackoverflow.com/a/60846265/11502856)
```bash
nextflow run nf-core/nanoraredx -profile test,docker
668 changes: 668 additions & 0 deletions genome_22/genome_22.fasta

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions genome_22/genome_22.fasta.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chr22 40001 7 60 61
2 changes: 2 additions & 0 deletions samplesheet.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sample_id,bam_path,fastq_dir,aligned_bam,methyl_bam,hpo_terms
test,https://raw.githubusercontent.com/nourmahfel/test-datasets/longraredisease/unmapped_bam/test.bam,,,,HP:0002721;HP:0002110;HP:0500093;HP:0000717;HP:0001263;HP:0001763;HP:0003298;HP:0002857;HP:0001382
Binary file added spectre/mosdepth.regions.bed.gz
Binary file not shown.
Binary file added spectre/test_clair3_merge_output.vcf.gz
Binary file not shown.
1 change: 1 addition & 0 deletions straglr/str.test.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chr22 45795354 45795424 ATTCT
Empty file added test.exclude.bed
Empty file.
Binary file added unmapped_bam/test.bam
Binary file not shown.