You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@McAllister-NOAA It looks like the sample_metadata files include sample E272_2B_NO20 but NCBI does not. There are also some mismatches where NCBI has "1B" as the middle part of the sample name and sample_metadata has "2B".
Here is how I checked:
# Compare sample names in sample_metadata to NCBI
library(xml2)
library(dplyr)
#Load data
#NCBI samples names; xml downloaded by hand from NCBI.
NCBI <- read_xml(x = "documentation/PRJNA982176_biosample_result.xml") %>%
xml_find_all(xpath = "//Id[@db_label='Sample name']") %>%
as_list() %>%
unlist() %>%
sort()
sample_metadata <- read.table(file = "data/sample_metadata/sample_metadata_16S.txt",
sep = "\t",
header = TRUE) %>%
pull(Sample) %>%
sort()
#compare samples names
NCBI
sample_metadata
which(!NCBI %in% sample_metadata)
# It looks like a lot of the mismatches are just the middle string being 1B or 2B, let's remove that
NCBI <- sub(pattern = "1B", replacement = "2B", x = NCBI)
sample_metadata <- sub(pattern = "1B", replacement = "2B", x = sample_metadata)
#Everything in NCBI is in sample_metadata
NCBI[!NCBI %in% sample_metadata] %>% sort()
#One sample is missing from NCBI
sample_metadata[!sample_metadata %in% NCBI] %>% sort()
The text was updated successfully, but these errors were encountered:
Thanks for the info Steve, it was very helpful for tracking down the problem, which was primarily due to the incorrect sequences being submitted to the SRA. Long story short, with the submission containing both 16S and 18S, the 18S is all correct, and the 16S had some different replicates chosen (the 1B/2B errors) and one additional sample (the missing one). I have submitted a second round to NCBI to correct these errors and will update and close this comment when I have a public accession to share.
@McAllister-NOAA It looks like the sample_metadata files include sample
E272_2B_NO20
but NCBI does not. There are also some mismatches where NCBI has "1B" as the middle part of the sample name and sample_metadata has "2B".Here is how I checked:
The text was updated successfully, but these errors were encountered: