missing sample at NCBI #2

sformel-usgs · 2023-07-14T17:38:25Z

@McAllister-NOAA It looks like the sample_metadata files include sample E272_2B_NO20 but NCBI does not. There are also some mismatches where NCBI has "1B" as the middle part of the sample name and sample_metadata has "2B".

Here is how I checked:

# Compare sample names in sample_metadata to NCBI

library(xml2)
library(dplyr)

#Load data

#NCBI samples names; xml downloaded by hand from NCBI.
NCBI <- read_xml(x = "documentation/PRJNA982176_biosample_result.xml") %>% 
  xml_find_all(xpath = "//Id[@db_label='Sample name']") %>% 
  as_list() %>%  
  unlist() %>% 
  sort()

sample_metadata <- read.table(file = "data/sample_metadata/sample_metadata_16S.txt",
                              sep = "\t",
                              header = TRUE) %>%
  pull(Sample) %>%
  sort()

#compare samples names

NCBI
sample_metadata

which(!NCBI %in% sample_metadata)

# It looks like a lot of the mismatches are just the middle string being 1B or 2B, let's remove that

NCBI <- sub(pattern = "1B", replacement = "2B", x = NCBI)
sample_metadata <- sub(pattern = "1B", replacement = "2B", x = sample_metadata)

#Everything in NCBI is in sample_metadata
NCBI[!NCBI %in% sample_metadata] %>% sort()

#One sample is missing from NCBI
sample_metadata[!sample_metadata %in% NCBI] %>% sort()

The text was updated successfully, but these errors were encountered:

McAllister-NOAA · 2023-09-08T18:55:04Z

Thanks for the info Steve, it was very helpful for tracking down the problem, which was primarily due to the incorrect sequences being submitted to the SRA. Long story short, with the submission containing both 16S and 18S, the 18S is all correct, and the 16S had some different replicates chosen (the 1B/2B errors) and one additional sample (the missing one). I have submitted a second round to NCBI to correct these errors and will update and close this comment when I have a public accession to share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing sample at NCBI #2

missing sample at NCBI #2

sformel-usgs commented Jul 14, 2023

McAllister-NOAA commented Sep 8, 2023 •

edited

Loading

missing sample at NCBI #2

missing sample at NCBI #2

Comments

sformel-usgs commented Jul 14, 2023

McAllister-NOAA commented Sep 8, 2023 • edited Loading

McAllister-NOAA commented Sep 8, 2023 •

edited

Loading