Check us out here! 🧬

Microbial marker gene reference database for wastewater

Lou LaMartina, Angie Schmoldt, Ryan Newton

Full-length 16S rRNA gene sequences, from 27F to 1492R and regions V1-V9. DNA sequences from the PacBio Sequel II were curated with DADA2, mothur, and Silva v.138. Sample information, FASTA sequences, counts, and taxonomy are publicly available in multiple formats.

ASV files

Amplicon sequence variants (ASVs), or unique DNA sequences, of 16S ribosomal RNA genes from wastewater bacteria. Counts files are the number of times (reads) that ASVs occur in each sample. Taxonomy files show the taxonomic classification of ASVs from Kingdom to Species. ASV names range from ASV0001 to ASV1041, ranked from most to least abundant. FASTA sequences of ASVs whose headers include ASV ID, taxonomic assignments, read count, and read direction (R1/R2).

Counts | GitHub | Google

Taxonomy | GitHub | Google

FASTA | GitHub

OTU files

Operational taxonomic units (OTUs) were generated by grouping ASVs that were at least 99.5% similar. OTU names range from OTU001 to OTU681, ranked from most to least abundant. If there was no consensus in taxonomy among ASVs within an OTU, the proportion of reads belonging to that ASV is in its name. For example, OTU011 was 16 ASVs all in the genus Acidovorax, but they were mixed with defluvii (11), carolinensis (4), or were unclassified (1) to species. Among all the reads in OTU011 (5568), 67% (3719) were assigned to defluvii, while carolinensis and unclassified were 16% and 17%, respectively. Therefore, the OTU names are OTU011_67, OTU011_17, and OTU011_16.

Counts | GitHub | Google

Taxonomy | GitHub | Google

Raw files, R Data, and code

Phyloseq is an R object with ASV or OTU counts, taxonomy, and sample information combined, for easy exploration in R. If you want to recreate the analysis, output files from each step (code script) are included.

FASTQs | NCBI Short Read Archive

Phyloseq | ASV | OTU

trim residual primers | code | input & input

dereplicate trimmed reads | code | input & input

subset sewage samples | code | input & input

cluster ASVs to OTUs | code | input & input

assess taxonomy | code | input & input

Sample set

In total, 46 wastewater treatment plant influent (raw sewage) underwent 16S rRNA gene sequencing. Samples encompass a wide range of bacterial diversity over space and time, according to previous studies (1, 2). Temporally, 24 sewage samples were collected once a month for two years from a single treatment plant. Spatially, 22 treatment plants were sampled from across the US, with southern samples from summer and northern samples from winter.

Metadata | GitHub | Google

Analysis

Marker gene. Hypervariable and conserved regions (V1-V9) were PCR-amplified at 27F and 1492R. Unique barcodes were appended to primers to allow sequencing of all samples simultaneously (multiplex).
DNA sequencing. PCR amplicons were sequenced in multiplex on a PacBio Sequel II.
Data processing. Data files were subsetted to individual samples according to their assigned barcodes. Cutadapt was used to trim primers and barcodes from reads, DADA2 generated ASV counts and assigned taxonomy, and mothur clustered ASVs into OTUs.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
Code		Code
Figures		Figures
Files		Files
RData		RData
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Check us out here! 🧬

Microbial marker gene reference database for wastewater

ASV files

OTU files

Raw files, R Data, and code

Sample set

Analysis

About

Releases

Packages

Contributors 2

Languages

loulanomics/Full16S_sewageDatabase

Folders and files

Latest commit

History

Repository files navigation

Check us out here! 🧬

Microbial marker gene reference database for wastewater

ASV files

OTU files

Raw files, R Data, and code

Sample set

Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages