-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rendering filament pages using NCBI dataset API #157
Comments
Ok thx @nekrut we will start on this and collect the tables from NCBI... |
Also link to UCSC genome browser in the genome file. |
Sry, not sure this is the right place for this comment.. but were it me I'd seriously consider adding some kinetoplastids to that list of initial taxa. |
T. Cruzi Those are the ones coming to me off the top of my head, though I feel like that's maybe missing a big leish species or two. I might not have the spelling quite right either.. it'd give you Chagas, African sleeping sickness and iirc all three forms of leish though I need to double check that. Considering the popularity of tritrypdb and the impact of these diseases, these species would be a very notable omission. Also, pretty sure we now have a few locally acquired cases of mucosal leish in Texas, as the sandfly habitat expands, so there's 'local' relevance.. thanks global warming |
Here is the initial set pf species https://docs.google.com/spreadsheets/d/1Gg9sw2Qw765tOx2To53XkTAn-RAMiBtqYrfItlLXXrc/edit?usp=sharing (replaces #153 ) |
@nekrut Question -- how can we map the genomes returned by NCBI to the UCSC Browser URLs specified in assemblyList.json? Previously we matched Thanks! |
Good point. They need to be built first. I will initiate process over the weekend. This can happen very quickly, but for now let's not link them to UCSC yet. |
@hunterckx, can you use the accession field from the NCBI response, e.g., "accession": "GCF_943734735.2," and provide a report on any that do not match either GenBank or RefSeq in the assemblyList.json? Cheers, |
Also, @hunterckx, please re-import the species list so we can get the latest updates. Thanks! |
@hunterckx, the "Search all filters" option on the Genomes page throws an error. Can you please fix it? Thanks! |
I've set it up to report separately on matches between
Looks like the ones missing for the used column pairs are also missing for the unused column pairs (i.e. we're not missing anything extra by not using those pairs) |
Updated output after switching to algorithm proposed initially in #194:
(The same as above when just matching with I'll also note that this appears to have led to one USCS Browser URL being left out, but that may be what we want if it was an erroneous match |
@nekrut now that we are using the spreadsheet to identify the curated list of assemblies to include, I suppose we could still call the taxon API and filter out all but the IDs that are given in the spreadsheet.
Or .. Is there a different API we should use to look up the genome by ID? |
The assemblies GCF_000277735.2_ASM27773v2 have now been added to the UCSC system. Can you check to see of we now match on our three above or are the _ASM27773v2 etc. causing us to mismatch. |
@NoopDog Hi! I'm currently working on migrating the GenomeArk project, with the goal of displaying it using BRC. I've written a script to generate a JSON file containing genome-related data, and I've been experimenting with the BRC code to display this information. The displayed columns will largely remain the same, though some modifications might be necessary. Would you prefer we continue the discussion in this task or create a new one? |
New ticket, please, for GenomeArk.
Thanks,
D
…On Wed, Dec 11, 2024 at 1:43 PM Patrik Smeds ***@***.***> wrote:
@NoopDog <https://github.com/NoopDog> Hi! I'm currently working on
migrating the GenomeArk project, with the goal of displaying it using BRC.
I've written a script to generate a JSON file containing genome-related
data, and I've been experimenting with the BRC code to display this
information. The displayed columns will largely remain the same, though
some modifications might be necessary. Would you prefer we continue the
discussion in this task or create a new one?
—
Reply to this email directly, view it on GitHub
<#157 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYW6EJ54Q5PLXYOEMEA2NL2FCWWLAVCNFSM6AAAAABRHKRDESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZXGI2TGNBWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Dave Rogers
Partner, Clever Canary
M: 646 286 5371
http://clevercanary.com
@clevercanary
@_DaveRogers
https://www.linkedin.com/pub/dave-rogers/0/90/92
|
This issue illustrates how NCBI Datasets API can be used to generates JSON blobs necessary for rendering filament pages (#130).
Linked Tickets
Data!
For initial set of taxa will be limited to these species: https://docs.google.com/spreadsheets/d/1Gg9sw2Qw765tOx2To53XkTAn-RAMiBtqYrfItlLXXrc/edit?usp=sharing
There is an issue on developing a data format for initializing the size = #201
List view
To populate this we call NCBI Datasets API to get additional info (not provided in the initialization JSON dataset):
This generates the following response:
Click to see JSON response
From this response we would like to render the following fields on a page (only showing two rows)
These are populated from the JSON response:
reports
->taxonomy
->current_scientific_name
->name
)reports
->taxonomy
->taxid
)reports
->taxonomy
->counts[0]
)Genomes page
Now let's suppose on the previous page a clicked both Anopheles gambiae and Coccidioides immitis checkboxes and selected "Go to Genomes" button.
This will be equivalent to passing the following GET request:
https://api.ncbi.nlm.nih.gov/datasets/v2/genome/taxon/7165%2C5501/dataset_report?filters.assembly_source=refseq&filters.has_annotation=true&filters.exclude_paired_reports=true&filters.exclude_atypical=true&filters.assembly_level=scaffold&filters.assembly_level=chromosome&filters.assembly_level=complete_genome
Which will be rendered as the following genome page:
organism -> organism_name
organism -> tax_id
accession
assembly_info -> refseq_category
assembly_info -> assembly_level
ssembly_stats -> total_number_of_chromosomes
assembly_stats -> total_sequence_length
assembly_stats -> number_of_scaffolds
assembly_stats -> scaffold_n50
assembly_stats -> scaffold_l50
assembly_stats -> gc_percent
annotation_info -> status
The text was updated successfully, but these errors were encountered: