Skip to content

Commit 399e5f1

Browse files
committed
update README.md
1 parent 8d3331b commit 399e5f1

9 files changed

+228
-93
lines changed

README.md

+101-48
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ More installation options can be found [here](https://oggmap.readthedocs.io/en/l
3232
We recommend installing `oggmap` in an independent conda environment to avoid dependent software conflicts.
3333
Please make a new python environment for `oggmap` and install dependent libraries in it.
3434

35-
If you do not have a working installation of Python 3.8 (or later),
35+
If you do not have a working installation of Python 3.10 (or later),
3636
consider installing [Anaconda](https://docs.anaconda.com/anaconda/install/) or
3737
[Miniconda](https://docs.conda.io/en/latest/miniconda.html).
3838

@@ -58,53 +58,92 @@ Detailed tutorials how to use `oggmap` can be found [here](https://oggmap.readth
5858
### Update/download local ncbi taxonomic database:
5959

6060
The following command downloads or updates your local copy of the
61-
NCBI's taxonomy database (~300MB). The database is saved at
62-
`~/.etetoolkit/taxa.sqlite`.
61+
NCBI's taxonomy database (~150MB). The database is saved at `-dbname`
62+
set to default `taxadb.sqlite`.
63+
64+
```shell
65+
$ oggmap ncbitax -u -outdir taxadb -type taxa -dbname taxadb.sqlite
66+
$ rm -rf taxadb
67+
```
6368

6469
```python
6570
>>> from oggmap import ncbitax
66-
>>> ncbitax.update_ncbi()
71+
>>> update_parser = ncbitax.define_parser()
72+
>>> update_args = update_parser.parse_args()
73+
>>> update_args.outdir = 'taxadb'
74+
>>> update_args.dbname = 'taxadb.sqlite'
75+
>>> ncbitax.update_ncbi(update_args)
6776
```
6877

6978
### Step 1 - Get query species taxonomic lineage information:
7079

7180
You can query a species lineage information based on its name or its
7281
taxID. For example `Danio rerio` with taxID `7955`:
7382

83+
```shell
84+
$ oggmap qlin -q "Danio rerio" -dbname taxadb.sqlite
85+
$ oggmap qlin -qt 7955 -dbname taxadb.sqlite
86+
```
87+
7488
```python
7589
>>> from oggmap import qlin
76-
>>> qlin.get_qlin(q = 'Danio rerio')
77-
>>> qlin.get_qlin(qt = '7955')
90+
>>> qlin.get_qlin(q='Danio rerio',
91+
... dbname = 'taxadb.sqlite')
92+
>>> qlin.get_qlin(qt='7955',
93+
... dbname = 'taxadb.sqlite')
7894
```
7995

8096
You can get the query species topology as a tree.
8197
For example for `Danio rerio` with taxID `7955`:
8298

8399
```python
100+
>>> from io import StringIO
101+
>>> from Bio import Phylo
84102
>>> from oggmap import qlin
85-
>>> query_topology = qlin.get_lineage_topo(qt = '7955')
86-
>>> query_topology.write()
103+
>>> query_topology = qlin.get_lineage_topo(qt='7955',
104+
... dbname='taxadb.sqlite')
105+
>>> output = StringIO()
106+
>>> Phylo.write(query_topology, output, "newick")
107+
>>> output.getvalue().strip()
87108
```
88109

89110
### Step 2 - Get query species orthomap from OrthoFinder results:
90111

91112
The following code extracts the `orthomap` for `Danio rerio` based on pre-calculated
92-
OrthoFinder results and ensembl release-105:
113+
OrthoFinder results and ensembl release-113:
93114

94115
OrthoFinder results (-S diamond_ultra_sens) using translated, longest-isoform coding sequences
95-
from ensembl release-105 have been archived and can be found
116+
from ensembl release-113 have been archived and can be found
96117
[here](https://zenodo.org/record/7242264#.Y1p19i0Rowc).
97118

119+
```shell
120+
# download OrthoFinder example:
121+
$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip
122+
$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.tsv.zip
123+
$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_species_list.tsv
124+
125+
# extract orthomap:
126+
$ oggmap of2orthomap -seqname 7955.danio_rerio.pep -qt 7955 \\
127+
-sl ensembl_113_orthofinder_last_species_list.tsv \\
128+
-oc ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip \\
129+
-og ensembl_113_orthofinder_last_Orthogroups.tsv.zip \\
130+
-dbname taxadb.sqlite
131+
```
132+
98133
```python
99-
>>> from oggmap import datasets, of2orthomap
100-
>>> datasets.ensembl105(datapath='.')
101-
>>> query_orthomap = of2orthomap.get_orthomap(
102-
... seqname='Danio_rerio.GRCz11.cds.longest',
134+
>>> from oggmap import datasets, of2orthomap, qlin
135+
>>> datasets.ensembl113_last(datapath='.')
136+
>>> query_orthomap, orthofinder_species_list, of_species_abundance = of2orthomap.get_orthomap(
137+
... seqname='7955.danio_rerio.pep',
103138
... qt='7955',
104-
... sl='ensembl_105_orthofinder_species_list.tsv',
105-
... oc='ensembl_105_orthofinder_Orthogroups.GeneCount.tsv',
106-
... og='ensembl_105_orthofinder_Orthogroups.tsv',
107-
... out=None, quiet=False, continuity=True, overwrite=True)
139+
... sl='ensembl_113_orthofinder_last_species_list.tsv',
140+
... oc='ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip',
141+
... og='ensembl_113_orthofinder_last_Orthogroups.tsv.zip',
142+
... out=None,
143+
... quiet=False,
144+
... continuity=True,
145+
... overwrite=True,
146+
... dbname='taxadb.sqlite')
108147
>>> query_orthomap
109148
```
110149

@@ -114,9 +153,21 @@ The following code extracts the gene to transcript table for `Danio rerio`:
114153

115154
GTF file obtained from [here](https://ftp.ensembl.org/pub/release-105/gtf/danio_rerio/Danio_rerio.GRCz11.105.gtf.gz).
116155

156+
```shell
157+
# to get GTF from Mus musculus on Linux run:
158+
$ wget https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz
159+
# on Mac:
160+
$ curl https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz --remote-name
161+
162+
# create t2g from GTF:
163+
$ oggmap gtf2t2g -i Mus_musculus.GRCm39.113.chr.gtf.gz \\
164+
-o Mus_musculus.GRCm39.113.chr.gtf.t2g.tsv \\
165+
-g -b -p -v -s
166+
```
167+
117168
```python
118169
>>> from oggmap import datasets, gtf2t2g
119-
>>> gtf_file = datasets.zebrafish_gtf(datapath='.')
170+
>>> gtf_file = datasets.zebrafish_ensembl113_gtf(datapath='.')
120171
>>> query_species_t2g = gtf2t2g.parse_gtf(
121172
... gtf=gtf_file,
122173
... g=True, b=True, p=True, v=True, s=True, q=True)
@@ -161,18 +212,18 @@ one can calculate the transcriptome evolutionary index (TEI) and add them to the
161212

162213
```python
163214
>>> # add TEI values to existing adata object
164-
>>> orthomap2tei.get_tei(adata=zebrafish_data,
165-
... gene_id=query_orthomap['geneID'],
166-
... gene_age=query_orthomap['PSnum'],
167-
... keep='min',
168-
... layer=None,
169-
... add=True,
170-
... obs_name='tei',
171-
... boot=False,
172-
... bt=10,
173-
... normalize_total=False,
174-
... log1p=False,
175-
... target_sum=1e6)
215+
>>> orthomap2tei.get_tei(adata = zebrafish_data,
216+
... gene_id = query_orthomap['geneID'],
217+
... gene_age = query_orthomap['PSnum'],
218+
... keep = 'min',
219+
... layer = None,
220+
... add = True,
221+
... obs_name = 'tei',
222+
... boot = False,
223+
... bt = 10,
224+
... normalize_total = False,
225+
... log1p = False,
226+
... target_sum = 1e6)
176227
```
177228

178229
### Step 5 - Downstream analysis
@@ -184,13 +235,13 @@ by any given observation pre-defined in the scRNA dataset.
184235
#### Boxplot TEI per stage:
185236

186237
```python
187-
>>>sc.pl.violin(adata=zebrafish_data,
188-
... keys=['tei'],
189-
... groupby='stage',
190-
... rotation=90,
191-
... palette='Paired',
192-
... stripplot=False,
193-
... inner='box')
238+
>>>sc.pl.violin(adata = zebrafish_data,
239+
... keys = ['tei'],
240+
... groupby = 'stage',
241+
... rotation = 90,
242+
... palette = 'Paired',
243+
... stripplot = False,
244+
... inner = 'box')
194245
```
195246

196247
## oggmap via Command Line
@@ -200,36 +251,38 @@ by any given observation pre-defined in the scRNA dataset.
200251
Command line documentation can be found [here](https://oggmap.readthedocs.io/en/latest/modules/oggmap.html).
201252

202253
```shell
203-
$ oggmap
254+
$ oggmap -h
204255
```
205256

206257
```
207258
usage: oggmap <sub-command>
208259
209260
oggmap
210261
211-
optional arguments:
262+
options:
212263
-h, --help show this help message and exit
213264
214265
sub-commands:
215-
{cds2aa,gtf2t2g,ncbitax,of2orthomap,plaza2orthomap,qlin}
266+
{cds2aa,gtf2t2g,ncbitax,of2orthomap,orthomcl2orthomap,plaza2orthomap,qlin}
216267
sub-commands help
217268
cds2aa translate CDS to AA and optional retain longest
218269
isoform <cds2aa -h>
219-
gtf2t2g extracts transcript to gene table from GTF <gtf2t2g
220-
-h>
270+
gtf2t2g extract transcript to gene table from GTF
271+
<gtf2t2g -h>
221272
ncbitax update local ncbi taxonomy database <ncbitax -h>
222-
of2orthomap extract orthomap from OrthoFinder output for query
223-
species <orthomap -h>
224-
plaza2orthomap extract orthomap from PLAZA gene family data for query
225-
species <of2orthomap -h>
273+
of2orthomap extract orthomap from OrthoFinder output for
274+
query species <of2orthomap -h>
275+
orthomcl2orthomap extract orthomap from orthomcl output for
276+
query species <orthomcl2orthomap -h>
277+
plaza2orthomap extract orthomap from PLAZA gene family data
278+
for query species <of2orthomap -h>
226279
qlin get query lineage based on ncbi taxonomy <qlin -h>
227280
```
228281

229282
To retrieve e.g. the lineage information for `Danio rerio` run the following command:
230283

231284
```shell
232-
$ oggmap qlin -q "Danio rerio"
285+
$ oggmap qlin -q "Danio rerio" -dbname taxadb.sqlite
233286
```
234287

235288
## Development Version

src/oggmap/__main__.py

+38-10
Original file line numberDiff line numberDiff line change
@@ -97,26 +97,38 @@ def define_parser():
9797
'''
9898
plaza2orthomap_example = '''plaza2orthomap example:
9999
100+
# download Species information and Gene Family Clusters from Dicots PLAZA 5.0 data:
101+
$ wget https://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_05/SpeciesInformation/species_information.csv.gz
102+
$ gunzip species_information.csv.gz
103+
$ wget https://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_05/GeneFamilies/genefamily_data.ORTHOFAM.csv.gz
104+
$ gunzip genefamily_data.ORTHOFAM.csv.gz
105+
$ wget https://ftp.psb.ugent.be/pub/plaza/plaza_public_dicots_05/GeneFamilies/genefamily_data.HOMFAM.csv.gz
106+
$ gunzip genefamily_data.HOMFAM.csv.gz
107+
100108
# using Orthologous gene family
101109
$ plaza2orthomap -qt 3702 \\
102110
-sl species_information.csv \\
103111
-og genefamily_data.ORTHOFAM.csv \\
104-
-out 3702.orthofam.orthomap
112+
-out 3702.orthofam.orthomap \\
113+
-dbname taxadb.sqlite
105114
106115
# using Homologous gene family
107116
$ plaza2orthomap -qt 3702 \\
108-
-sl species_information.csv \\
117+
-sl species_information.csv \\
109118
-og genefamily_data.HOMFAM.csv \\
110-
-out 3702.homfam.orthomap
119+
-out 3702.homfam.orthomap \\
120+
-dbname taxadb.sqlite
111121
'''
112122
qlin_example = '''qlin example:
113123
114-
# get query lineage to be used with oggmap later on using query species taxid
124+
# get query lineage to be used with oggmap later on using query species taxID
115125
# Mus musculus; 10090
116-
$ qlin -qt 10090
126+
$ qlin -qt 10090 \\
127+
-dbname taxadb.sqlite
117128
118129
# using query species name
119-
$ qlin -q "Mus musculus"
130+
$ qlin -q "Mus musculus" \\
131+
-dbname taxadb.sqlite
120132
'''
121133
cds2aa_parser = subparsers.add_parser(name='cds2aa',
122134
help='translate CDS to AA and optional retain longest isoform <cds2aa -h>',
@@ -235,6 +247,9 @@ def main():
235247
ncbitax.update_ncbi(args)
236248
if args.subcommand == 'of2orthomap':
237249
print(args)
250+
if not args.dbname:
251+
print('\nError <-dbname> : Please specify taxadb.sqlite file')
252+
sys.exit()
238253
if not args.seqname:
239254
parser.print_help()
240255
print('\nError <-seqname>: Please specify query species name in orthofinder and taxid')
@@ -263,9 +278,13 @@ def main():
263278
out=args.out,
264279
quiet=False,
265280
continuity=True,
266-
overwrite=args.overwrite)
281+
overwrite=args.overwrite,
282+
dbname=args.dbname)
267283
if args.subcommand == 'orthomcl2orthomap':
268284
print(args)
285+
if not args.dbname:
286+
print('\nError <-dbname> : Please specify taxadb.sqlite file')
287+
sys.exit()
269288
if not args.tla:
270289
parser.print_help()
271290
print('\nError <-tla>: Please specify query species orthomcl short name (THREE_LETTER_ABBREV)')
@@ -284,9 +303,13 @@ def main():
284303
out=args.out,
285304
quiet=False,
286305
continuity=True,
287-
overwrite=args.overwrite)
306+
overwrite=args.overwrite,
307+
dbname=args.dbname)
288308
if args.subcommand == 'plaza2orthomap':
289309
print(args)
310+
if not args.dbname:
311+
print('\nError <-dbname> : Please specify taxadb.sqlite file')
312+
sys.exit()
290313
if not args.qt:
291314
parser.print_help()
292315
print('\nError <-qt>: Please specify query species taxID')
@@ -306,9 +329,13 @@ def main():
306329
out=args.out,
307330
quiet=False,
308331
continuity=True,
309-
overwrite=args.overwrite)
332+
overwrite=args.overwrite,
333+
dbname=args.dbname)
310334
if args.subcommand == 'qlin':
311335
print(args)
336+
if not args.dbname:
337+
print('\nError <-dbname> : Please specify taxadb.sqlite file')
338+
sys.exit()
312339
if not args.q and not args.qt:
313340
parser.print_help()
314341
print('\nError <-q> <-qt>: Please specify query species name or taxid')
@@ -319,7 +346,8 @@ def main():
319346
sys.exit()
320347
qlin.get_qlin(q=args.q,
321348
qt=args.qt,
322-
quiet=False)
349+
quiet=False,
350+
dbname=args.dbname)
323351

324352

325353
if __name__ == '__main__':

0 commit comments

Comments
 (0)