Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ categories = ["command-line-utilities", "science"]
homepage = "https://seqeralabs.github.io/RustQC/"
exclude = ["benchmark/", "docs/", "paper/", "tests/", ".github/", "Dockerfile", ".dockerignore", ".pre-commit-config.yaml", "netlify.toml", "CONTRIBUTING.md", "AGENTS.md"]

[lib]
name = "rustqc"
path = "src/lib.rs"

[[bin]]
name = "rustqc"
path = "src/main.rs"
Expand Down
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,17 @@ cargo install rustqc

See the [documentation](https://seqeralabs.github.io/RustQC/) for full usage details, configuration options, output file descriptions, and benchmark results.

## Use as a Rust library

The crate is also published as a library, so the QC analysis modules (GTF parsing, dupRadar, featureCounts, RSeQC, Qualimap, preseq, samtools-style outputs) can be embedded into other Rust programs:

```toml
[dependencies]
rustqc = "0.2"
```

See the [library guide](https://seqeralabs.github.io/RustQC/usage/library/) and the full API reference on [docs.rs/rustqc](https://docs.rs/rustqc).

## AI & Provenance

RustQC was developed with substantial assistance from AI coding agents (primarily [Claude](https://claude.ai/)), using the upstream tool source code as reference. Correctness is validated by comparing output against the original tools on real sequencing data, not by manual code review alone. See the [AI & Provenance](https://seqeralabs.github.io/RustQC/about/ai-statement/) documentation for full details, including known validation gaps.
Expand Down
1 change: 1 addition & 0 deletions docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ export default defineConfig({
slug: "usage/configuration",
},
{ label: "Performance & Tuning", slug: "usage/performance" },
{ label: "Rust Library", slug: "usage/library" },
],
},
{
Expand Down
102 changes: 102 additions & 0 deletions docs/src/content/docs/usage/library.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
title: Rust Library
description: Use RustQC as a Rust library crate, embedding its QC analysis modules in your own programs.
---

import { Aside } from "@astrojs/starlight/components";

RustQC is published on [crates.io](https://crates.io/crates/rustqc) as both a
binary and a library. The CLI (`rustqc rna ...`) is the primary interface, but
the same analysis modules are also exposed as a library so they can be embedded
into other Rust programs.

Full API reference: **[docs.rs/rustqc](https://docs.rs/rustqc)**.

## Adding RustQC as a dependency

```toml
[dependencies]
rustqc = "0.2.1" # Or whatever the latest release is
```

`rust-htslib` is linked statically and a small C++ component (used by the preseq
tool) is built from source, so a working C/C++ toolchain (`cc`, `c++`) is
required when building. No runtime dependencies are added beyond what the binary
already needs.

## What's in the library

The crate exposes these modules:

| Module | Contents |
| ----------------------------- | --------------------------------------------------------------------------------------------------------- |
| [`gtf`][docs-gtf] | GTF gene-annotation parsing. `Gene`, `Transcript`, `Exon`, `parse_gtf`. |
| [`io`][docs-io] | Transparent gzip-aware reader, FNV-1a hashing, number formatters. |
| [`config`][docs-config] | Configuration types mirroring the CLI's YAML config file. |
| [`summary`][docs-summary] | Serializable types for the JSON run summary. |
| [`cpu`][docs-cpu] | CPU feature detection and binary-target identification. |
| [`rna`][docs-rna] | RNA-Seq analyses: `dupradar`, `featurecounts`, `qualimap`, `preseq`, `rseqc`. |

[`Strandedness`][docs-strandedness] lives at the crate root because it is used
across most analysis modules.

[docs-gtf]: https://docs.rs/rustqc/latest/rustqc/gtf/
[docs-io]: https://docs.rs/rustqc/latest/rustqc/io/
[docs-config]: https://docs.rs/rustqc/latest/rustqc/config/
[docs-summary]: https://docs.rs/rustqc/latest/rustqc/summary/
[docs-cpu]: https://docs.rs/rustqc/latest/rustqc/cpu/
[docs-rna]: https://docs.rs/rustqc/latest/rustqc/rna/
[docs-strandedness]: https://docs.rs/rustqc/latest/rustqc/enum.Strandedness.html

## Quick examples

Parse a GTF file:

```rust
use rustqc::gtf;

let genes = gtf::parse_gtf("genes.gtf", &[])?;
println!("{} genes parsed", genes.len());
for (gene_id, gene) in genes.iter().take(3) {
println!("{gene_id}: {} transcripts", gene.transcripts.len());
}
# Ok::<(), anyhow::Error>(())
```

Open a possibly-gzipped annotation or output file with one call:

```rust
use std::io::BufRead;
use rustqc::io::open_reader;

let reader = open_reader("counts.tsv.gz")?;
for line in reader.lines() {
println!("{}", line?);
}
# Ok::<(), anyhow::Error>(())
```

Use the `Strandedness` enum (it derives `serde::Deserialize` and clap's
`ValueEnum`, so it integrates with both YAML configs and CLI parsers):

```rust
use rustqc::Strandedness;

let s = Strandedness::Reverse;
assert_eq!(s.to_string(), "reverse");
```

## Stability

The library is at `0.2.x` and the public surface is intentionally small. Expect
breaking changes in minor releases until `1.0`. Module visibility may be
narrowed in future versions if internal types are inadvertently exposed.

<Aside type="note">
The full single-pass RNA-Seq pipeline (the `run_rna` orchestrator that the
binary uses) is not yet exposed as a library entry point. For now, library
consumers drive individual analyses themselves. Pipeline-level orchestration
may be exposed in a future release — track [issue #72][issue-72].
</Aside>

[issue-72]: https://github.com/seqeralabs/RustQC/issues/72
31 changes: 3 additions & 28 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,9 @@
//!
//! A GTF gene annotation file is required for all analyses.

use clap::{CommandFactory, Parser, Subcommand, ValueEnum};
use serde::Deserialize;
use clap::{CommandFactory, Parser, Subcommand};

/// Library strandedness protocol.
///
/// Determines how read strand is interpreted relative to the gene annotation
/// strand during counting. Accepted CLI values: `unstranded`, `forward`, `reverse`.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, ValueEnum, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum Strandedness {
/// Count reads on either strand (library is not strand-specific).
#[default]
Unstranded,
/// Forward stranded: read 1 maps to the transcript strand.
Forward,
/// Reverse stranded: read 2 maps to the transcript strand (e.g. dUTP).
Reverse,
}

impl std::fmt::Display for Strandedness {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Strandedness::Unstranded => write!(f, "unstranded"),
Strandedness::Forward => write!(f, "forward"),
Strandedness::Reverse => write!(f, "reverse"),
}
}
}
use rustqc::Strandedness;

/// Fast quality control tools for sequencing data, written in Rust.
#[derive(Parser, Debug)]
Expand Down Expand Up @@ -407,7 +382,7 @@ pub fn parse_args() -> Cli {
env!("CARGO_PKG_VERSION"),
env!("GIT_SHORT_HASH"),
env!("BUILD_TIMESTAMP"),
crate::cpu::cpu_info_line(),
rustqc::cpu::cpu_info_line(),
)
.into_boxed_str(),
);
Expand Down
6 changes: 3 additions & 3 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
//! like chromosome name mappings between alignment file and GTF references,
//! per-tool output configuration, and tool enable/disable toggles.

use crate::cli::Strandedness;
use crate::Strandedness;
use anyhow::{Context, Result};
use serde::Deserialize;
use serde_yaml_ng::Value;
Expand Down Expand Up @@ -1213,7 +1213,7 @@ preseq:
deep_merge(&mut base, overlay);
let m = base.as_mapping().unwrap();
let items = m
.get(&Value::String("items".into()))
.get(Value::String("items".into()))
.unwrap()
.as_sequence()
.unwrap();
Expand Down Expand Up @@ -1268,7 +1268,7 @@ preseq:

let paths = collect_config_paths(Some("/tmp/nonexistent.yml"));
// The -c flag should always be last
assert!(paths.last().unwrap().0 == PathBuf::from("/tmp/nonexistent.yml"));
assert!(paths.last().unwrap().0 == Path::new("/tmp/nonexistent.yml"));
assert_eq!(paths.last().unwrap().1, "-c flag");

// Restore
Expand Down
105 changes: 105 additions & 0 deletions src/io.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ use flate2::read::GzDecoder;
use std::fs::File;
use std::io::{BufRead, BufReader, Read, Seek};
use std::path::Path;
use std::time::Duration;

/// Gzip magic bytes: the first two bytes of any gzip-compressed file.
const GZIP_MAGIC: [u8; 2] = [0x1f, 0x8b];
Expand Down Expand Up @@ -101,6 +102,56 @@ pub fn format_with_commas(n: u64) -> String {
result
}

/// Format a count with SI suffixes (e.g. "1.5K", "48.2M", "2.3G").
///
/// Used for compact human-readable counts in progress messages and summaries.
pub fn format_count(n: u64) -> String {
use number_prefix::NumberPrefix;
match NumberPrefix::decimal(n as f64) {
NumberPrefix::Standalone(n) => format!("{n}"),
NumberPrefix::Prefixed(prefix, n) => {
// Map SI prefixes to short single-char suffixes
let suffix = match prefix {
number_prefix::Prefix::Kilo => "K",
number_prefix::Prefix::Mega => "M",
number_prefix::Prefix::Giga => "G",
number_prefix::Prefix::Tera => "T",
_ => return format!("{:.1}{prefix:?}", n),
};
format!("{n:.1}{suffix}")
}
}
}

/// Format a percentage string (e.g. "(83.3%)").
pub fn format_pct(n: u64, total: u64) -> String {
if total == 0 {
return "(0.0%)".to_string();
}
format!("({:.1}%)", n as f64 / total as f64 * 100.0)
}

/// Format a duration as human-friendly mm:ss or h:mm:ss.
///
/// - Under 60s: `"45.2s"`
/// - Under 1h: `"1:23"`
/// - Over 1h: `"1:02:34"`
pub fn format_duration(d: Duration) -> String {
let total_secs = d.as_secs_f64();
if total_secs < 60.0 {
return format!("{total_secs:.1}s");
}
let total_secs = d.as_secs();
let hours = total_secs / 3600;
let minutes = (total_secs % 3600) / 60;
let seconds = total_secs % 60;
if hours > 0 {
format!("{hours}:{minutes:02}:{seconds:02}")
} else {
format!("{minutes}:{seconds:02}")
}
}

// ============================================================
// Numeric helpers
// ============================================================
Expand Down Expand Up @@ -181,6 +232,60 @@ mod tests {
assert_eq!(format_with_commas(1234567), "1,234,567");
}

#[test]
fn test_format_count_small() {
assert_eq!(format_count(0), "0");
assert_eq!(format_count(42), "42");
assert_eq!(format_count(999), "999");
}

#[test]
fn test_format_count_thousands() {
assert_eq!(format_count(1000), "1.0K");
assert_eq!(format_count(1500), "1.5K");
assert_eq!(format_count(50000), "50.0K");
}

#[test]
fn test_format_count_millions() {
assert_eq!(format_count(1_000_000), "1.0M");
assert_eq!(format_count(48_200_000), "48.2M");
assert_eq!(format_count(50_000_000), "50.0M");
}

#[test]
fn test_format_count_billions() {
assert_eq!(format_count(1_000_000_000), "1.0G");
assert_eq!(format_count(5_000_000_000), "5.0G");
}

#[test]
fn test_format_pct() {
assert_eq!(format_pct(833, 1000), "(83.3%)");
assert_eq!(format_pct(0, 0), "(0.0%)");
assert_eq!(format_pct(1000, 1000), "(100.0%)");
}

#[test]
fn test_format_duration_seconds() {
assert_eq!(format_duration(Duration::from_secs_f64(0.5)), "0.5s");
assert_eq!(format_duration(Duration::from_secs_f64(45.2)), "45.2s");
assert_eq!(format_duration(Duration::from_secs_f64(59.9)), "59.9s");
}

#[test]
fn test_format_duration_minutes() {
assert_eq!(format_duration(Duration::from_secs(60)), "1:00");
assert_eq!(format_duration(Duration::from_secs(83)), "1:23");
assert_eq!(format_duration(Duration::from_secs(3599)), "59:59");
}

#[test]
fn test_format_duration_hours() {
assert_eq!(format_duration(Duration::from_secs(3600)), "1:00:00");
assert_eq!(format_duration(Duration::from_secs(3754)), "1:02:34");
}

#[test]
fn test_open_reader_plain() {
let content = "line1\nline2\nline3\n";
Expand Down
Loading
Loading