@@ -32,7 +32,7 @@ More installation options can be found [here](https://oggmap.readthedocs.io/en/l
32
32
We recommend installing ` oggmap ` in an independent conda environment to avoid dependent software conflicts.
33
33
Please make a new python environment for ` oggmap ` and install dependent libraries in it.
34
34
35
- If you do not have a working installation of Python 3.8 (or later),
35
+ If you do not have a working installation of Python 3.10 (or later),
36
36
consider installing [ Anaconda] ( https://docs.anaconda.com/anaconda/install/ ) or
37
37
[ Miniconda] ( https://docs.conda.io/en/latest/miniconda.html ) .
38
38
@@ -58,53 +58,92 @@ Detailed tutorials how to use `oggmap` can be found [here](https://oggmap.readth
58
58
### Update/download local ncbi taxonomic database:
59
59
60
60
The following command downloads or updates your local copy of the
61
- NCBI's taxonomy database (~ 300MB). The database is saved at
62
- ` ~/.etetoolkit/taxa.sqlite ` .
61
+ NCBI's taxonomy database (~ 150MB). The database is saved at ` -dbname `
62
+ set to default ` taxadb.sqlite ` .
63
+
64
+ ``` shell
65
+ $ oggmap ncbitax -u -outdir taxadb -type taxa -dbname taxadb.sqlite
66
+ $ rm -rf taxadb
67
+ ```
63
68
64
69
``` python
65
70
>> > from oggmap import ncbitax
66
- >> > ncbitax.update_ncbi()
71
+ >> > update_parser = ncbitax.define_parser()
72
+ >> > update_args = update_parser.parse_args()
73
+ >> > update_args.outdir = ' taxadb'
74
+ >> > update_args.dbname = ' taxadb.sqlite'
75
+ >> > ncbitax.update_ncbi(update_args)
67
76
```
68
77
69
78
### Step 1 - Get query species taxonomic lineage information:
70
79
71
80
You can query a species lineage information based on its name or its
72
81
taxID. For example ` Danio rerio ` with taxID ` 7955 ` :
73
82
83
+ ``` shell
84
+ $ oggmap qlin -q " Danio rerio" -dbname taxadb.sqlite
85
+ $ oggmap qlin -qt 7955 -dbname taxadb.sqlite
86
+ ```
87
+
74
88
``` python
75
89
>> > from oggmap import qlin
76
- >> > qlin.get_qlin(q = ' Danio rerio' )
77
- >> > qlin.get_qlin(qt = ' 7955' )
90
+ >> > qlin.get_qlin(q = ' Danio rerio' ,
91
+ ... dbname = ' taxadb.sqlite' )
92
+ >> > qlin.get_qlin(qt = ' 7955' ,
93
+ ... dbname = ' taxadb.sqlite' )
78
94
```
79
95
80
96
You can get the query species topology as a tree.
81
97
For example for ` Danio rerio ` with taxID ` 7955 ` :
82
98
83
99
``` python
100
+ >> > from io import StringIO
101
+ >> > from Bio import Phylo
84
102
>> > from oggmap import qlin
85
- >> > query_topology = qlin.get_lineage_topo(qt = ' 7955' )
86
- >> > query_topology.write()
103
+ >> > query_topology = qlin.get_lineage_topo(qt = ' 7955' ,
104
+ ... dbname = ' taxadb.sqlite' )
105
+ >> > output = StringIO()
106
+ >> > Phylo.write(query_topology, output, " newick" )
107
+ >> > output.getvalue().strip()
87
108
```
88
109
89
110
### Step 2 - Get query species orthomap from OrthoFinder results:
90
111
91
112
The following code extracts the ` orthomap ` for ` Danio rerio ` based on pre-calculated
92
- OrthoFinder results and ensembl release-105 :
113
+ OrthoFinder results and ensembl release-113 :
93
114
94
115
OrthoFinder results (-S diamond_ultra_sens) using translated, longest-isoform coding sequences
95
- from ensembl release-105 have been archived and can be found
116
+ from ensembl release-113 have been archived and can be found
96
117
[ here] ( https://zenodo.org/record/7242264#.Y1p19i0Rowc ) .
97
118
119
+ ``` shell
120
+ # download OrthoFinder example:
121
+ $ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip
122
+ $ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.tsv.zip
123
+ $ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_species_list.tsv
124
+
125
+ # extract orthomap:
126
+ $ oggmap of2orthomap -seqname 7955.danio_rerio.pep -qt 7955 \\
127
+ -sl ensembl_113_orthofinder_last_species_list.tsv \\
128
+ -oc ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip \\
129
+ -og ensembl_113_orthofinder_last_Orthogroups.tsv.zip \\
130
+ -dbname taxadb.sqlite
131
+ ```
132
+
98
133
``` python
99
- >> > from oggmap import datasets, of2orthomap
100
- >> > datasets.ensembl105 (datapath = ' .' )
101
- >> > query_orthomap = of2orthomap.get_orthomap(
102
- ... seqname = ' Danio_rerio.GRCz11.cds.longest ' ,
134
+ >> > from oggmap import datasets, of2orthomap, qlin
135
+ >> > datasets.ensembl113_last (datapath = ' .' )
136
+ >> > query_orthomap, orthofinder_species_list, of_species_abundance = of2orthomap.get_orthomap(
137
+ ... seqname = ' 7955.danio_rerio.pep ' ,
103
138
... qt = ' 7955' ,
104
- ... sl = ' ensembl_105_orthofinder_species_list.tsv' ,
105
- ... oc = ' ensembl_105_orthofinder_Orthogroups.GeneCount.tsv' ,
106
- ... og = ' ensembl_105_orthofinder_Orthogroups.tsv' ,
107
- ... out = None , quiet = False , continuity = True , overwrite = True )
139
+ ... sl = ' ensembl_113_orthofinder_last_species_list.tsv' ,
140
+ ... oc = ' ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip' ,
141
+ ... og = ' ensembl_113_orthofinder_last_Orthogroups.tsv.zip' ,
142
+ ... out = None ,
143
+ ... quiet = False ,
144
+ ... continuity = True ,
145
+ ... overwrite = True ,
146
+ ... dbname = ' taxadb.sqlite' )
108
147
>> > query_orthomap
109
148
```
110
149
@@ -114,9 +153,21 @@ The following code extracts the gene to transcript table for `Danio rerio`:
114
153
115
154
GTF file obtained from [ here] ( https://ftp.ensembl.org/pub/release-105/gtf/danio_rerio/Danio_rerio.GRCz11.105.gtf.gz ) .
116
155
156
+ ``` shell
157
+ # to get GTF from Mus musculus on Linux run:
158
+ $ wget https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz
159
+ # on Mac:
160
+ $ curl https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz --remote-name
161
+
162
+ # create t2g from GTF:
163
+ $ oggmap gtf2t2g -i Mus_musculus.GRCm39.113.chr.gtf.gz \\
164
+ -o Mus_musculus.GRCm39.113.chr.gtf.t2g.tsv \\
165
+ -g -b -p -v -s
166
+ ```
167
+
117
168
``` python
118
169
>> > from oggmap import datasets, gtf2t2g
119
- >> > gtf_file = datasets.zebrafish_gtf (datapath = ' .' )
170
+ >> > gtf_file = datasets.zebrafish_ensembl113_gtf (datapath = ' .' )
120
171
>> > query_species_t2g = gtf2t2g.parse_gtf(
121
172
... gtf = gtf_file,
122
173
... g = True , b = True , p = True , v = True , s = True , q = True )
@@ -161,18 +212,18 @@ one can calculate the transcriptome evolutionary index (TEI) and add them to the
161
212
162
213
``` python
163
214
>> > # add TEI values to existing adata object
164
- >> > orthomap2tei.get_tei(adata = zebrafish_data,
165
- ... gene_id = query_orthomap[' geneID' ],
166
- ... gene_age = query_orthomap[' PSnum' ],
167
- ... keep = ' min' ,
168
- ... layer = None ,
169
- ... add = True ,
170
- ... obs_name = ' tei' ,
171
- ... boot = False ,
172
- ... bt = 10 ,
173
- ... normalize_total = False ,
174
- ... log1p = False ,
175
- ... target_sum = 1e6 )
215
+ >> > orthomap2tei.get_tei(adata = zebrafish_data,
216
+ ... gene_id = query_orthomap[' geneID' ],
217
+ ... gene_age = query_orthomap[' PSnum' ],
218
+ ... keep = ' min' ,
219
+ ... layer = None ,
220
+ ... add = True ,
221
+ ... obs_name = ' tei' ,
222
+ ... boot = False ,
223
+ ... bt = 10 ,
224
+ ... normalize_total = False ,
225
+ ... log1p = False ,
226
+ ... target_sum = 1e6 )
176
227
```
177
228
178
229
### Step 5 - Downstream analysis
@@ -184,13 +235,13 @@ by any given observation pre-defined in the scRNA dataset.
184
235
#### Boxplot TEI per stage:
185
236
186
237
``` python
187
- >> > sc.pl.violin(adata = zebrafish_data,
188
- ... keys = [' tei' ],
189
- ... groupby = ' stage' ,
190
- ... rotation = 90 ,
191
- ... palette = ' Paired' ,
192
- ... stripplot = False ,
193
- ... inner = ' box' )
238
+ >> > sc.pl.violin(adata = zebrafish_data,
239
+ ... keys = [' tei' ],
240
+ ... groupby = ' stage' ,
241
+ ... rotation = 90 ,
242
+ ... palette = ' Paired' ,
243
+ ... stripplot = False ,
244
+ ... inner = ' box' )
194
245
```
195
246
196
247
## oggmap via Command Line
@@ -200,36 +251,38 @@ by any given observation pre-defined in the scRNA dataset.
200
251
Command line documentation can be found [ here] ( https://oggmap.readthedocs.io/en/latest/modules/oggmap.html ) .
201
252
202
253
``` shell
203
- $ oggmap
254
+ $ oggmap -h
204
255
```
205
256
206
257
```
207
258
usage: oggmap <sub-command>
208
259
209
260
oggmap
210
261
211
- optional arguments :
262
+ options :
212
263
-h, --help show this help message and exit
213
264
214
265
sub-commands:
215
- {cds2aa,gtf2t2g,ncbitax,of2orthomap,plaza2orthomap,qlin}
266
+ {cds2aa,gtf2t2g,ncbitax,of2orthomap,orthomcl2orthomap, plaza2orthomap,qlin}
216
267
sub-commands help
217
268
cds2aa translate CDS to AA and optional retain longest
218
269
isoform <cds2aa -h>
219
- gtf2t2g extracts transcript to gene table from GTF <gtf2t2g
220
- -h>
270
+ gtf2t2g extract transcript to gene table from GTF
271
+ <gtf2t2g -h>
221
272
ncbitax update local ncbi taxonomy database <ncbitax -h>
222
- of2orthomap extract orthomap from OrthoFinder output for query
223
- species <orthomap -h>
224
- plaza2orthomap extract orthomap from PLAZA gene family data for query
225
- species <of2orthomap -h>
273
+ of2orthomap extract orthomap from OrthoFinder output for
274
+ query species <of2orthomap -h>
275
+ orthomcl2orthomap extract orthomap from orthomcl output for
276
+ query species <orthomcl2orthomap -h>
277
+ plaza2orthomap extract orthomap from PLAZA gene family data
278
+ for query species <of2orthomap -h>
226
279
qlin get query lineage based on ncbi taxonomy <qlin -h>
227
280
```
228
281
229
282
To retrieve e.g. the lineage information for ` Danio rerio ` run the following command:
230
283
231
284
``` shell
232
- $ oggmap qlin -q " Danio rerio"
285
+ $ oggmap qlin -q " Danio rerio" -dbname taxadb.sqlite
233
286
```
234
287
235
288
## Development Version
0 commit comments