@@ -103,18 +103,21 @@ places mutations ("characters" in phylogenetic terminology) on a given tree.
103
103
## Tree traversals
104
104
105
105
Given a single {class}` Tree ` , traversals in various orders are possible using the
106
- {meth}` ~Tree.nodes ` iterator. For example, in the following tree we can visit the
107
- nodes in different orders:
106
+ {meth}` ~Tree.nodes ` iterator. Take the following tree:
108
107
109
108
110
109
``` {code-cell} ipython3
111
110
import tskit
112
- from IPython.display import SVG, display
113
111
114
112
ts = tskit.load("data/tree_traversals.trees")
115
113
tree = ts.first()
116
- display(SVG(tree.draw_svg()))
114
+ tree.draw_svg()
115
+ ```
116
+
117
+ We can visit the nodes in different orders by providing an ` order ` parameter to
118
+ the {meth}` Tree.nodes ` iterator:
117
119
120
+ ``` {code-cell} ipython3
118
121
for order in ["preorder", "inorder", "postorder"]:
119
122
print(f"{order}:\t", list(tree.nodes(order=order)))
120
123
```
@@ -225,7 +228,7 @@ different time points:
225
228
``` {code-cell} ipython3
226
229
ts = tskit.load("data/different_time_samples.trees")
227
230
tree = ts.first()
228
- SVG( tree.draw_svg(y_axis=True, time_scale="rank") )
231
+ tree.draw_svg(y_axis=True, time_scale="rank")
229
232
```
230
233
231
234
The generation times for these nodes are as follows:
@@ -280,42 +283,55 @@ print(dict(zip(range(3), nearest_neighbor_of)))
280
283
281
284
## Parsimony
282
285
283
- The {meth} ` Tree.map_mutations ` method finds a parsimonious explanation for a
284
- set of discrete character observations on the samples in a tree using classical
285
- phylogenetic algorithms.
286
+ Take a site on the following tree with three allelic states, where the
287
+ samples are coloured by the allele they possess, but where we don't know
288
+ the position of the mutations that caused this variation:
286
289
287
290
``` {code-cell} ipython3
288
291
tree = tskit.load("data/parsimony_simple.trees").first()
289
292
alleles = ["red", "blue", "green"]
290
293
genotypes = [0, 0, 0, 0, 1, 2]
291
294
styles = [f".n{j} > .sym {{fill: {alleles[g]}}}" for j, g in enumerate(genotypes)]
292
- display(SVG(tree.draw_svg(style="".join(styles))))
295
+ tree.draw_svg(style="".join(styles))
296
+ ```
293
297
298
+ The {meth}` Tree.map_mutations ` method finds a parsimonious explanation for a
299
+ set of discrete character observations on the samples in a tree using classical
300
+ phylogenetic algorithms:
301
+
302
+ ``` {code-cell} ipython3
294
303
ancestral_state, mutations = tree.map_mutations(genotypes, alleles)
295
304
print("Ancestral state = ", ancestral_state)
296
305
for mut in mutations:
297
306
print(f"Mutation: node = {mut.node} derived_state = {mut.derived_state}")
298
307
```
299
308
300
- So , the algorithm has concluded, quite reasonably, that the most parsimonious
309
+ In this case , the algorithm has concluded, quite reasonably, that the most parsimonious
301
310
description of this state is that the ancestral state is red and there was
302
311
a mutation to blue and green over nodes 4 and 5.
303
312
304
313
### Building tables
305
314
306
- One of the main uses of {meth}` Tree.map_mutations ` is to position mutations on a tree
307
- to encode observed data. In the following example we show how a set
308
- of tables can be updated using the {ref}` Tables API<tskit:sec_tables_api> ` ; here we
309
- infer the location of mutations in an simulated tree sequence of one tree,
310
- given a set of site positions with their genotypes and allelic states:
315
+ Below we show how a set of tables can be updated using the
316
+ {ref}` Tables API<tskit:sec_tables_api> ` , taking advantage of the
317
+ {meth}` Tree.map_mutations ` method to identify parsimonious positions
318
+ for mutations on a tree. Here's the tree we'll use:
311
319
312
320
``` {code-cell} ipython3
313
321
import pickle
314
322
ts = tskit.load("data/parsimony_map.trees")
323
+ ts.draw_svg(size=(500, 300), time_scale="rank")
324
+ ```
325
+
326
+ Now we can modify the tables by adding mutations. To find the location of mutations,
327
+ we infer them from some observed data (some site positions with associated genotypes
328
+ and allelic states, in the conventional {class}` tskit encoding <Variant> ` ):
329
+
330
+
331
+ ``` {code-cell} ipython3
315
332
with open("data/parsimony_map.pickle", "rb") as file:
316
333
data = pickle.load(file) # Load saved variant data from a file
317
- display(SVG(ts.draw_svg(size=(500, 300), time_scale="rank")))
318
- print("Variant data: pos, genotypes & alleles as described by the ts.variants() iterator:")
334
+ print("Variant data: each site has a position, allele list, and genotypes array:")
319
335
for i, v in enumerate(data):
320
336
print(f"Site {i} (pos {v['pos']:7.4f}): alleles: {v['alleles']}, genotypes: {v['genotypes']}")
321
337
print()
@@ -333,11 +349,13 @@ for variant in data:
333
349
parent += parent_offset
334
350
mut_id = tables.mutations.add_row(
335
351
site_id, node=mut.node, parent=parent, derived_state=mut.derived_state)
336
- info += f", and places mutation {mut.id } to {mut.derived_state} above node {mut.node}"
352
+ info += f", and places mutation {mut_id } to {mut.derived_state} above node {mut.node}"
337
353
print(info)
338
354
new_ts = tables.tree_sequence()
339
355
```
340
356
357
+ And here are the parsimoniously positioned mutations on the tree
358
+
341
359
``` {code-cell} ipython3
342
360
mut_labels = {} # An array of labels for the mutations
343
361
for mut in new_ts.mutations(): # Make pretty labels showing the change in state
@@ -346,34 +364,37 @@ for mut in new_ts.mutations(): # Make pretty labels showing the change in state
346
364
prev = new_ts.mutation(mut.parent).derived_state if older_mut else site.ancestral_state
347
365
mut_labels[site.id] = f"{mut.id}: {prev}→{mut.derived_state}"
348
366
349
- display(SVG( new_ts.draw_svg(size=(500, 300), mutation_labels=mut_labels, time_scale="rank")) )
367
+ new_ts.draw_svg(size=(500, 300), mutation_labels=mut_labels, time_scale="rank")
350
368
```
351
369
352
370
353
371
### Parsimony and missing data
354
372
355
- The Hartigan parsimony algorithm in {meth} ` Tree.map_mutations ` can also take missing data
373
+ We can also take missing data
356
374
into account when finding a set of parsimonious state transitions. We do this by
357
375
specifying the special value {data}` tskit.MISSING_DATA ` (-1) as the state, which is
358
376
treated by the algorithm as "could be anything".
359
377
360
- For example, here we state that sample 0 is missing, and use the colour white to indicate
361
- this:
378
+ For example, here we state that sample 0 is missing, indicated by the colour white:
362
379
363
380
``` {code-cell} ipython3
364
381
tree = tskit.load("data/parsimony_simple.trees").first()
365
382
alleles = ["red", "blue", "green", "white"]
366
383
genotypes = [tskit.MISSING_DATA, 0, 0, 0, 1, 2]
367
384
styles = [f".n{j} > .sym {{fill: {alleles[g]}}}" for j, g in enumerate(genotypes)]
368
- display(SVG(tree.draw_svg(style="".join(styles))))
385
+ tree.draw_svg(style="".join(styles))
386
+ ```
387
+
388
+ Now we run the {meth}` Tree.map_mutations ` method, which applies the Hartigan parsimony
389
+ algorithm:
369
390
391
+ ``` {code-cell} ipython3
370
392
ancestral_state, mutations = tree.map_mutations(genotypes, alleles)
371
393
print("Ancestral state = ", ancestral_state)
372
394
for mut in mutations:
373
395
print(f"Mutation: node = {mut.node} derived_state = {mut.derived_state}")
374
396
```
375
397
376
-
377
398
The algorithm decided, again, quite reasonably, that the most parsimonious explanation
378
399
for the input data is the same as before. Thus, if we used this information to fill
379
400
out mutation table as above, we would impute the missing value for 0 as red.
@@ -389,7 +410,7 @@ ancestral_state, mutations = tree.map_mutations(genotypes, alleles)
389
410
print("Ancestral state = ", ancestral_state)
390
411
for mut in mutations:
391
412
print(f"Mutation: node = {mut.node} derived_state = {mut.derived_state}")
392
- SVG( tree.draw(node_colours=node_colours) )
413
+ tree.draw(node_colours=node_colours)
393
414
```
394
415
395
416
0 commit comments