Rendering filament pages using NCBI dataset API #157

nekrut · 2024-11-05T20:07:19Z

This issue illustrates how NCBI Datasets API can be used to generates JSON blobs necessary for rendering filament pages (#130).

Linked Tickets

Doing API script in Read organism and genome info from NCBI #159
Genomes list exploration in UI Exploration on Genomes list #177

Data!

For initial set of taxa will be limited to these species: https://docs.google.com/spreadsheets/d/1Gg9sw2Qw765tOx2To53XkTAn-RAMiBtqYrfItlLXXrc/edit?usp=sharing

There is an issue on developing a data format for initializing the size = #201

List view

To populate this we call NCBI Datasets API to get additional info (not provided in the initialization JSON dataset):

curl -X POST "https://api.ncbi.nlm.nih.gov/datasets/v2/taxonomy/dataset_report" \
 -H 'accept: application/json'\
 -H 'content-type: application/json' \
 -d '{"taxons":["Plasmodium falciparum","Plasmodium vivax","Plasmodium yoelii","Plasmodium vinckei","Culex pipiens","Anopheles gambiae","Toxoplasma gondii","Mycobacterium tuberculosis","Coccidioides posadasii","Coccidioides immitis"],"children":false,"ranks":["genus"]}'

This generates the following response:

Click to see JSON response

{
  "reports": [
    {
      "taxonomy": {
        "tax_id": 7165,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Anopheles gambiae",
          "authority": "Giles, 1902"
        },
        "curator_common_name": "African malaria mosquito",
        "group_name": "mosquitos",
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "kingdom": {
            "name": "Metazoa",
            "id": 33208
          },
          "phylum": {
            "name": "Arthropoda",
            "id": 6656
          },
          "class": {
            "name": "Insecta",
            "id": 50557
          },
          "order": {
            "name": "Diptera",
            "id": 7147
          },
          "family": {
            "name": "Culicidae",
            "id": 7157
          },
          "genus": {
            "name": "Anopheles",
            "id": 7164
          },
          "species": {
            "name": "Anopheles gambiae",
            "id": 7165
          }
        },
        "parents": [
          1,
          131567,
          2759,
          33154,
          33208,
          6072,
          33213,
          33317,
          1206794,
          88770,
          6656,
          197563,
          197562,
          6960,
          50557,
          85512,
          7496,
          33340,
          33392,
          7147,
          7148,
          43786,
          41827,
          7157,
          43816,
          7164,
          44534,
          44537,
          44542
        ],
        "children": [
          180454
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 7
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 15164
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 422
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 615
          },
          {
            "type": "COUNT_TYPE_snRNA",
            "count": 27
          },
          {
            "type": "COUNT_TYPE_snoRNA",
            "count": 11
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 12518
          },
          {
            "type": "COUNT_TYPE_ncRNA",
            "count": 1209
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Anopheles gambiae"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 5501,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Coccidioides immitis",
          "authority": "G.W. Stiles, 1896"
        },
        "group_name": "ascomycete fungi",
        "has_type_material": true,
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "kingdom": {
            "name": "Fungi",
            "id": 4751
          },
          "phylum": {
            "name": "Ascomycota",
            "id": 4890
          },
          "class": {
            "name": "Eurotiomycetes",
            "id": 147545
          },
          "order": {
            "name": "Onygenales",
            "id": 33183
          },
          "family": {
            "name": "Onygenaceae",
            "id": 33184
          },
          "genus": {
            "name": "Coccidioides",
            "id": 5500
          },
          "species": {
            "name": "Coccidioides immitis",
            "id": 5501
          }
        },
        "parents": [
          1,
          131567,
          2759,
          33154,
          4751,
          451864,
          4890,
          716545,
          147538,
          716546,
          147545,
          451871,
          33183,
          33184,
          5500
        ],
        "children": [
          246410,
          454286,
          404692,
          396776
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 5
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 9974
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 147
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 29
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 9797
          },
          {
            "type": "COUNT_TYPE_ncRNA",
            "count": 1
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Coccidioides immitis"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 199306,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Coccidioides posadasii",
          "authority": "M.C. Fisher, G.L. Koenig, T.J. White & J.W. Taylor, 2002"
        },
        "group_name": "ascomycete fungi",
        "has_type_material": true,
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "kingdom": {
            "name": "Fungi",
            "id": 4751
          },
          "phylum": {
            "name": "Ascomycota",
            "id": 4890
          },
          "class": {
            "name": "Eurotiomycetes",
            "id": 147545
          },
          "order": {
            "name": "Onygenales",
            "id": 33183
          },
          "family": {
            "name": "Onygenaceae",
            "id": 33184
          },
          "genus": {
            "name": "Coccidioides",
            "id": 5500
          },
          "species": {
            "name": "Coccidioides posadasii",
            "id": 199306
          }
        },
        "parents": [
          1,
          131567,
          2759,
          33154,
          4751,
          451864,
          4890,
          716545,
          147538,
          716546,
          147545,
          451871,
          33183,
          33184,
          5500
        ],
        "children": [
          443226,
          469471
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 13
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 8510
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 163
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 2
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 8342
          },
          {
            "type": "COUNT_TYPE_ncRNA",
            "count": 1
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Coccidioides posadasii"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 7175,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Culex pipiens",
          "authority": "Linnaeus, 1758"
        },
        "curator_common_name": "northern house mosquito",
        "group_name": "mosquitos",
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "kingdom": {
            "name": "Metazoa",
            "id": 33208
          },
          "phylum": {
            "name": "Arthropoda",
            "id": 6656
          },
          "class": {
            "name": "Insecta",
            "id": 50557
          },
          "order": {
            "name": "Diptera",
            "id": 7147
          },
          "family": {
            "name": "Culicidae",
            "id": 7157
          },
          "genus": {
            "name": "Culex",
            "id": 7174
          },
          "species": {
            "name": "Culex pipiens",
            "id": 7175
          }
        },
        "parents": [
          1,
          131567,
          2759,
          33154,
          33208,
          6072,
          33213,
          33317,
          1206794,
          88770,
          6656,
          197563,
          197562,
          6960,
          50557,
          85512,
          7496,
          33340,
          33392,
          7147,
          7148,
          43786,
          41827,
          7157,
          43817,
          53550,
          7174,
          53527,
          518105
        ],
        "children": [
          1833972,
          38569,
          42434,
          233155
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 5
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 19673
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 686
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 155
          },
          {
            "type": "COUNT_TYPE_snRNA",
            "count": 58
          },
          {
            "type": "COUNT_TYPE_snoRNA",
            "count": 9
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 16298
          },
          {
            "type": "COUNT_TYPE_ncRNA",
            "count": 1620
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Culex pipiens"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 1773,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Mycobacterium tuberculosis",
          "authority": "(Zopf 1883) Lehmann and Neumann 1896 (Approved Lists 1980)",
          "basionym": {
            "name": "\"Bacterium tuberculosis\"",
            "authority": "Zopf 1883",
            "notes": [
              {
                "name": "Effective Name",
                "note": "This is an effectively published name.",
                "note_classifier": "effective_name"
              }
            ]
          }
        },
        "group_name": "high G+C Gram-positive bacteria",
        "has_type_material": true,
        "classification": {
          "superkingdom": {
            "name": "Bacteria",
            "id": 2
          },
          "kingdom": {
            "name": "Bacillati",
            "id": 1783272
          },
          "phylum": {
            "name": "Actinomycetota",
            "id": 201174
          },
          "class": {
            "name": "Actinomycetes",
            "id": 1760
          },
          "order": {
            "name": "Mycobacteriales",
            "id": 85007
          },
          "family": {
            "name": "Mycobacteriaceae",
            "id": 1762
          },
          "genus": {
            "name": "Mycobacterium",
            "id": 1763
          },
          "species": {
            "name": "Mycobacterium tuberculosis",
            "id": 1773
          }
        },
        "parents": [
          1,
          131567,
          2,
          1783272,
          201174,
          1760,
          85007,
          1762,
          1763,
          77643
        ],
        "children": [
          1427330,
          1427329
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 7819
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 4008
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 45
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 3
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 3906
          },
          {
            "type": "COUNT_TYPE_miscRNA",
            "count": 2
          },
          {
            "type": "COUNT_TYPE_ncRNA",
            "count": 20
          },
          {
            "type": "COUNT_TYPE_OTHER",
            "count": 2
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Mycobacterium tuberculosis"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 5833,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Plasmodium falciparum"
        },
        "curator_common_name": "malaria parasite P. falciparum",
        "group_name": "apicomplexans",
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "phylum": {
            "name": "Apicomplexa",
            "id": 5794
          },
          "class": {
            "name": "Aconoidasida",
            "id": 422676
          },
          "order": {
            "name": "Haemosporida",
            "id": 5819
          },
          "family": {
            "name": "Plasmodiidae",
            "id": 1639119
          },
          "genus": {
            "name": "Plasmodium",
            "id": 5820
          },
          "species": {
            "name": "Plasmodium falciparum",
            "id": 5833
          }
        },
        "parents": [
          1,
          131567,
          2759,
          2698737,
          33630,
          5794,
          422676,
          5819,
          1639119,
          5820,
          418107
        ],
        "children": [
          478864,
          1036723
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 67
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 5618
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 45
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 28
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 5285
          },
          {
            "type": "COUNT_TYPE_ncRNA",
            "count": 102
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Plasmodium falciparum"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 5860,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Plasmodium vinckei",
          "authority": "(Rodhain, 1952)"
        },
        "group_name": "apicomplexans",
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "phylum": {
            "name": "Apicomplexa",
            "id": 5794
          },
          "class": {
            "name": "Aconoidasida",
            "id": 422676
          },
          "order": {
            "name": "Haemosporida",
            "id": 5819
          },
          "family": {
            "name": "Plasmodiidae",
            "id": 1639119
          },
          "genus": {
            "name": "Plasmodium",
            "id": 5820
          },
          "species": {
            "name": "Plasmodium vinckei",
            "id": 5860
          }
        },
        "parents": [
          1,
          131567,
          2759,
          2698737,
          33630,
          5794,
          422676,
          5819,
          1639119,
          5820,
          418101
        ],
        "children": [
          54757,
          138298,
          138297,
          119398
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 10
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 5147
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 67
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 11
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 5050
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Plasmodium vinckei"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 5855,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Plasmodium vivax",
          "authority": "(Grassi & Feletti, 1890)"
        },
        "curator_common_name": "malaria parasite P. vivax",
        "group_name": "apicomplexans",
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "phylum": {
            "name": "Apicomplexa",
            "id": 5794
          },
          "class": {
            "name": "Aconoidasida",
            "id": 422676
          },
          "order": {
            "name": "Haemosporida",
            "id": 5819
          },
          "family": {
            "name": "Plasmodiidae",
            "id": 1639119
          },
          "genus": {
            "name": "Plasmodium",
            "id": 5820
          },
          "species": {
            "name": "Plasmodium vivax",
            "id": 5855
          }
        },
        "parents": [
          1,
          131567,
          2759,
          2698737,
          33630,
          5794,
          422676,
          5819,
          1639119,
          5820,
          418103
        ],
        "children": [
          31273,
          126793,
          1035514,
          1035515,
          882766,
          1077284,
          1033975
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 19
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 5513
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 44
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 22
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 5395
          },
          {
            "type": "COUNT_TYPE_miscRNA",
            "count": 10
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Plasmodium vivax"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 5861,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Plasmodium yoelii"
        },
        "group_name": "apicomplexans",
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "phylum": {
            "name": "Apicomplexa",
            "id": 5794
          },
          "class": {
            "name": "Aconoidasida",
            "id": 422676
          },
          "order": {
            "name": "Haemosporida",
            "id": 5819
          },
          "family": {
            "name": "Plasmodiidae",
            "id": 1639119
          },
          "genus": {
            "name": "Plasmodium",
            "id": 5820
          },
          "species": {
            "name": "Plasmodium yoelii",
            "id": 5861
          }
        },
        "parents": [
          1,
          131567,
          2759,
          2698737,
          33630,
          5794,
          422676,
          5819,
          1639119,
          5820,
          418101
        ],
        "children": [
          73239,
          1050261,
          31274,
          283801,
          1323249,
          1050262
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 15
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 6233
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 52
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 39
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 6037
          },
          {
            "type": "COUNT_TYPE_ncRNA",
            "count": 47
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Plasmodium yoelii"
      ]
    },
    {
      "taxonomy": {
        "tax_id": 5811,
        "rank": "SPECIES",
        "current_scientific_name": {
          "name": "Toxoplasma gondii"
        },
        "group_name": "apicomplexans",
        "classification": {
          "superkingdom": {
            "name": "Eukaryota",
            "id": 2759
          },
          "phylum": {
            "name": "Apicomplexa",
            "id": 5794
          },
          "class": {
            "name": "Conoidasida",
            "id": 1280412
          },
          "order": {
            "name": "Eucoccidiorida",
            "id": 75739
          },
          "family": {
            "name": "Sarcocystidae",
            "id": 5809
          },
          "genus": {
            "name": "Toxoplasma",
            "id": 5810
          },
          "species": {
            "name": "Toxoplasma gondii",
            "id": 5811
          }
        },
        "parents": [
          1,
          131567,
          2759,
          2698737,
          33630,
          5794,
          1280412,
          5796,
          75739,
          423054,
          5809,
          5810
        ],
        "children": [
          933077,
          398031
        ],
        "counts": [
          {
            "type": "COUNT_TYPE_ASSEMBLY",
            "count": 29
          },
          {
            "type": "COUNT_TYPE_GENE",
            "count": 8925
          },
          {
            "type": "COUNT_TYPE_tRNA",
            "count": 183
          },
          {
            "type": "COUNT_TYPE_rRNA",
            "count": 424
          },
          {
            "type": "COUNT_TYPE_PROTEIN_CODING",
            "count": 8318
          }
        ],
        "genomic_moltype": "dsDNA",
        "current_scientific_name_is_formal": true
      },
      "query": [
        "Toxoplasma gondii"
      ]
    }
  ],
  "total_count": 10
}

From this response we would like to render the following fields on a page (only showing two rows)

[ ]	Taxon	TaxId	# Assemblies	Tags
[ ]	Anopheles gambiae	7165	7	Vector
[ ]	Coccidioides immitis	5501	5	Fungi

These are populated from the JSON response:

taxon = (reports -> taxonomy -> current_scientific_name -> name)
taxid = (reports -> taxonomy -> taxid)
# Assemblies = (reports -> taxonomy -> counts[0])
Tag = custom added by us

Genomes page

Now let's suppose on the previous page a clicked both Anopheles gambiae and Coccidioides immitis checkboxes and selected "Go to Genomes" button.

This will be equivalent to passing the following GET request:

https://api.ncbi.nlm.nih.gov/datasets/v2/genome/taxon/7165%2C5501/dataset_report?filters.assembly_source=refseq&filters.has_annotation=true&filters.exclude_paired_reports=true&filters.exclude_atypical=true&filters.assembly_level=scaffold&filters.assembly_level=chromosome&filters.assembly_level=complete_genome

Which will be rendered as the following genome page:

[ ]	Taxon	TaxId	Accession	IsRef	Level	# Chr	Len	# Scaffolds	Scaffold N50	Scaffold L50	Coverage	GC%	Ann Status
[ ]	Anopheles gambiae	7165	GCF_943734735.2	Yes	Chromosome	3	264451381	190	99149756	2	54.0x	44.5	Full annotation

Taxon = organism -> organism_name
TaxId = organism -> tax_id
Accession = accession
IsRef = assembly_info -> refseq_category
Level = assembly_info -> assembly_level
# Chr = ssembly_stats -> total_number_of_chromosomes
Len = assembly_stats -> total_sequence_length
# Scaffolds = assembly_stats -> number_of_scaffolds
Scaffold N50 = assembly_stats -> scaffold_n50
Scaffold L50 =assembly_stats -> scaffold_l50
GC% = assembly_stats -> gc_percent
Annotation status = annotation_info -> status

The text was updated successfully, but these errors were encountered:

NoopDog · 2024-11-06T23:40:46Z

Ok thx @nekrut we will start on this and collect the tables from NCBI...

NoopDog · 2024-11-06T23:56:42Z

Also link to UCSC genome browser in the genome file.

d-callan · 2024-11-14T23:13:09Z

Sry, not sure this is the right place for this comment.. but were it me I'd seriously consider adding some kinetoplastids to that list of initial taxa.

d-callan · 2024-11-14T23:36:44Z

T. Cruzi
T. Brucei
Leish major
Leish donovoni
Leish brazilensis

Those are the ones coming to me off the top of my head, though I feel like that's maybe missing a big leish species or two. I might not have the spelling quite right either.. it'd give you Chagas, African sleeping sickness and iirc all three forms of leish though I need to double check that. Considering the popularity of tritrypdb and the impact of these diseases, these species would be a very notable omission.

Also, pretty sure we now have a few locally acquired cases of mucosal leish in Texas, as the sandfly habitat expands, so there's 'local' relevance.. thanks global warming

nekrut · 2024-11-15T22:53:05Z

Here is the initial set pf species https://docs.google.com/spreadsheets/d/1Gg9sw2Qw765tOx2To53XkTAn-RAMiBtqYrfItlLXXrc/edit?usp=sharing

(replaces #153 )

hunterckx · 2024-11-16T00:02:27Z

@nekrut Question -- how can we map the genomes returned by NCBI to the UCSC Browser URLs specified in assemblyList.json? Previously we matched Genome Version/Assembly ID from this genomes spreadsheet with either genBank or refSeq from the assembly list, but I'm not familiar enough with what the fields mean to determine which ID(s) from the NCBI API would be necessary to match with the ones in the assembly list.

Thanks!

nekrut · 2024-11-16T01:48:07Z

@nekrut Question -- how can we map the genomes returned by NCBI to the UCSC Browser URLs specified in assemblyList.json? Previously we matched Genome Version/Assembly ID from this genomes spreadsheet with either genBank or refSeq from the assembly list, but I'm not familiar enough with what the fields mean to determine which ID(s) from the NCBI API would be necessary to match with the ones in the assembly list.

Thanks!

Good point. They need to be built first. I will initiate process over the weekend. This can happen very quickly, but for now let's not link them to UCSC yet.

NoopDog · 2024-12-05T14:51:22Z

@hunterckx, can you use the accession field from the NCBI response, e.g., "accession": "GCF_943734735.2," and provide a report on any that do not match either GenBank or RefSeq in the assemblyList.json?

Cheers,
D

NoopDog · 2024-12-05T14:57:20Z

Also, @hunterckx, please re-import the species list so we can get the latest updates. Thanks!

NoopDog · 2024-12-05T18:02:18Z

@hunterckx, the "Search all filters" option on the Genomes page throws an error. Can you please fix it? Thanks!

hunterckx · 2024-12-05T23:42:47Z

@hunterckx, can you use the accession field from the NCBI response, e.g., "accession": "GCF_943734735.2," and provide a report on any that do not match either GenBank or RefSeq in the assemblyList.json?

Cheers, D

I've set it up to report separately on matches between pairedAccession, accession, genBank, and refSeq, since the matching here is only done between pairedAccession and genBank, and accession and refSeq. Here's what it reports (parentheses around column pairs that are not currently used for matching):

3 values from pairedAccession absent in genBank: GCA_000277735.2, GCA_030566675.1, GCA_963525475.1

(20 values from pairedAccession absent in refSeq: GCA_000195955.2, GCA_000002765.3, GCA_000002725.2, GCA_900002385.2, GCA_018416015.2, GCA_900681995.1, GCA_000227135.2, GCA_000006565.2, GCA_000002445.1, GCA_943734735.2, GCA_000002415.2, GCA_016801865.2, GCA_000002845.2, GCA_000209065.1, GCA_000149335.2, GCA_000277735.2, GCA_009858895.3, GCA_000857045.1, GCA_030566675.1, GCA_963525475.1)

(20 values from accession absent in genBank: GCF_000195955.2, GCF_000002765.6, GCF_000002725.2, GCF_900002385.2, GCF_018416015.2, GCF_900681995.1, GCF_000227135.1, GCF_000006565.2, GCF_000002445.2, GCF_943734735.2, GCF_000002415.2, GCF_016801865.2, GCF_000002845.2, GCF_000209065.1, GCF_000149335.2, GCF_000277735.2, GCF_009858895.2, GCF_000857045.1, GCF_030566675.1, GCF_963525475.1)

3 values from accession absent in refSeq: GCF_000277735.2, GCF_030566675.1, GCF_963525475.1

Looks like the ones missing for the used column pairs are also missing for the unused column pairs (i.e. we're not missing anything extra by not using those pairs)

hunterckx · 2024-12-06T07:23:01Z

Updated output after switching to algorithm proposed initially in #194:

3 accessions had no match in assembly list: GCF_000277735.2, GCF_030566675.1, GCF_963525475.1

(The same as above when just matching with refSeq)

I'll also note that this appears to have led to one USCS Browser URL being left out, but that may be what we want if it was an erroneous match

NoopDog · 2024-12-11T06:45:19Z

@nekrut now that we are using the spreadsheet to identify the curated list of assemblies to include, I suppose we could still call the taxon API and filter out all but the IDs that are given in the spreadsheet.

curl -X POST "https://api.ncbi.nlm.nih.gov/datasets/v2/taxonomy/dataset_report" \
 -H 'accept: application/json'\
 -H 'content-type: application/json' \
 -d '{"taxons":["Plasmodium falciparum","Plasmodium vivax","Plasmodium yoelii","Plasmodium vinckei","Culex pipiens","Anopheles gambiae","Toxoplasma gondii","Mycobacterium tuberculosis","Coccidioides posadasii","Coccidioides immitis"],"children":false,"ranks":["genus"]}'

Or .. Is there a different API we should use to look up the genome by ID?

NoopDog · 2024-12-11T06:49:37Z

@hunterckx

The assemblies

GCF_000277735.2_ASM27773v2
GCF_963525475.1_MtbRf
GCF_030566675.1_ASM3056667v1

have now been added to the UCSC system. Can you check to see of we now match on our three above or are the _ASM27773v2 etc. causing us to mismatch.

hunterckx · 2024-12-11T07:56:03Z

@NoopDog Seems like everything matches up now -- I've pushed the latest to the branch for #178!

Smeds · 2024-12-11T21:42:40Z

@NoopDog Hi! I'm currently working on migrating the GenomeArk project, with the goal of displaying it using BRC. I've written a script to generate a JSON file containing genome-related data, and I've been experimenting with the BRC code to display this information. The displayed columns will largely remain the same, though some modifications might be necessary. Would you prefer we continue the discussion in this task or create a new one?

NoopDog · 2024-12-12T09:06:16Z

New ticket, please, for GenomeArk. Thanks, D

…

On Wed, Dec 11, 2024 at 1:43 PM Patrik Smeds ***@***.***> wrote: @NoopDog <https://github.com/NoopDog> Hi! I'm currently working on migrating the GenomeArk project, with the goal of displaying it using BRC. I've written a script to generate a JSON file containing genome-related data, and I've been experimenting with the BRC code to display this information. The displayed columns will largely remain the same, though some modifications might be necessary. Would you prefer we continue the discussion in this task or create a new one? — Reply to this email directly, view it on GitHub <#157 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYW6EJ54Q5PLXYOEMEA2NL2FCWWLAVCNFSM6AAAAABRHKRDESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZXGI2TGNBWGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Dave Rogers Partner, Clever Canary M: 646 286 5371 http://clevercanary.com @clevercanary @_DaveRogers https://www.linkedin.com/pub/dave-rogers/0/90/92

nekrut added this to BRC development tasks Nov 4, 2024

nekrut converted this from a draft issue Nov 5, 2024

nekrut assigned NoopDog and Smeds Nov 5, 2024

nekrut changed the title ~~Rendering filament pages using NCBI dataset commands~~ Rendering filament pages using NCBI dataset API Nov 5, 2024

NoopDog assigned hunterckx Nov 6, 2024

NoopDog mentioned this issue Nov 6, 2024

Read organism and genome info from NCBI #159

Closed

nekrut moved this to In Progress in BRC development tasks Nov 14, 2024

NoopDog mentioned this issue Nov 15, 2024

UI Exploration on Genomes list #177

Closed

MillenniumFalconMechanic self-assigned this Nov 18, 2024

NoopDog mentioned this issue Dec 5, 2024

Determine why the count of assemblies on the organisms list does not match the genomes for the taxon on the genomes list #192

Closed

Smeds mentioned this issue Jan 30, 2025

description moved from brc project galaxyproject/ga2#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rendering filament pages using NCBI dataset API #157

Rendering filament pages using NCBI dataset API #157

nekrut commented Nov 5, 2024 •

edited

Loading

NoopDog commented Nov 6, 2024

NoopDog commented Nov 6, 2024

d-callan commented Nov 14, 2024

d-callan commented Nov 14, 2024

nekrut commented Nov 15, 2024 •

edited

Loading

hunterckx commented Nov 16, 2024

nekrut commented Nov 16, 2024

NoopDog commented Dec 5, 2024

NoopDog commented Dec 5, 2024

NoopDog commented Dec 5, 2024

hunterckx commented Dec 5, 2024

hunterckx commented Dec 6, 2024

NoopDog commented Dec 11, 2024

NoopDog commented Dec 11, 2024

hunterckx commented Dec 11, 2024

Smeds commented Dec 11, 2024

NoopDog commented Dec 12, 2024 via email

Rendering filament pages using NCBI dataset API #157

Rendering filament pages using NCBI dataset API #157

Comments

nekrut commented Nov 5, 2024 • edited Loading

Linked Tickets

Data!

List view

Genomes page

NoopDog commented Nov 6, 2024

NoopDog commented Nov 6, 2024

d-callan commented Nov 14, 2024

d-callan commented Nov 14, 2024

nekrut commented Nov 15, 2024 • edited Loading

hunterckx commented Nov 16, 2024

nekrut commented Nov 16, 2024

NoopDog commented Dec 5, 2024

NoopDog commented Dec 5, 2024

NoopDog commented Dec 5, 2024

hunterckx commented Dec 5, 2024

hunterckx commented Dec 6, 2024

NoopDog commented Dec 11, 2024

NoopDog commented Dec 11, 2024

hunterckx commented Dec 11, 2024

Smeds commented Dec 11, 2024

NoopDog commented Dec 12, 2024 via email

nekrut commented Nov 5, 2024 •

edited

Loading

nekrut commented Nov 15, 2024 •

edited

Loading