Check us out here! 🧬
Lou LaMartina, Angie Schmoldt, Ryan Newton
Full-length 16S rRNA gene sequences, from 27F to 1492R and regions V1-V9. DNA sequences from the PacBio Sequel II were curated with DADA2, mothur, and Silva v.138. Sample information, FASTA sequences, counts, and taxonomy are publicly available in multiple formats.
Amplicon sequence variants (ASVs), or unique DNA sequences, of 16S ribosomal RNA genes from wastewater bacteria. Counts files are the number of times (reads) that ASVs occur in each sample. Taxonomy files show the taxonomic classification of ASVs from Kingdom to Species. ASV names range from ASV0001 to ASV1041, ranked from most to least abundant. FASTA sequences of ASVs whose headers include ASV ID, taxonomic assignments, read count, and read direction (R1/R2).
FASTA | GitHub
Operational taxonomic units (OTUs) were generated by grouping ASVs that were at least 99.5% similar. OTU names range from OTU001 to OTU681, ranked from most to least abundant. If there was no consensus in taxonomy among ASVs within an OTU, the proportion of reads belonging to that ASV is in its name. For example, OTU011 was 16 ASVs all in the genus Acidovorax, but they were mixed with defluvii (11), carolinensis (4), or were unclassified (1) to species. Among all the reads in OTU011 (5568), 67% (3719) were assigned to defluvii, while carolinensis and unclassified were 16% and 17%, respectively. Therefore, the OTU names are OTU011_67, OTU011_17, and OTU011_16.
Phyloseq is an R object with ASV or OTU counts, taxonomy, and sample information combined, for easy exploration in R. If you want to recreate the analysis, output files from each step (code script) are included.
FASTQs | NCBI Short Read Archive
trim residual primers | code | input & input
dereplicate trimmed reads | code | input & input
subset sewage samples | code | input & input
In total, 46 wastewater treatment plant influent (raw sewage) underwent 16S rRNA gene sequencing. Samples encompass a wide range of bacterial diversity over space and time, according to previous studies (1, 2). Temporally, 24 sewage samples were collected once a month for two years from a single treatment plant. Spatially, 22 treatment plants were sampled from across the US, with southern samples from summer and northern samples from winter.
-
Marker gene. Hypervariable and conserved regions (V1-V9) were PCR-amplified at 27F and 1492R. Unique barcodes were appended to primers to allow sequencing of all samples simultaneously (multiplex).
-
DNA sequencing. PCR amplicons were sequenced in multiplex on a PacBio Sequel II.
-
Data processing. Data files were subsetted to individual samples according to their assigned barcodes. Cutadapt was used to trim primers and barcodes from reads, DADA2 generated ASV counts and assigned taxonomy, and mothur clustered ASVs into OTUs.