You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Perhaps the biggest difference between `tskit` and other phylogenetic libraries is that
288
-
each node *must* have a time associated with it. Node times can be arbitrary (marked
289
-
by setting {attr}`TreeSequence.time_units` to `"uncalibrated"`), but they must
290
-
be present. This means that `tskit` trees are always directional (i.e. they are
291
-
"rooted").
287
+
Perhaps the most noticable different between a `tskit`tree and the encoding of trees
288
+
in other phylogenetic libraries is that `tskit` does not explicitly store branch lengths.
289
+
Instead, each node has a *time* associated with it. Branch lengths can therefore be
290
+
found by calculating the difference between the time of a node and the time of its
291
+
parent node.
292
292
293
-
The primary reason for this strict requirement is to ensure temporal consistency across
294
-
the trees in the tree sequence. In particular it ensures that
295
-
a node cannot be a parent of a node in one tree, and a child of the same node in another
296
-
tree in the tree sequence, a logically impossibility.
293
+
Since nodes *must* have a time, `tskit` trees aways have these (implicit) branch
294
+
lengths. To represent a tree ("cladogram") in which the branch lengths are not
295
+
meaningful, the {attr}`TreeSequence.time_units`of a tree sequence can be
296
+
specified as `"uncalibrated"` (see below)
297
297
298
-
The units in which time is measured are stored in the {attr}`TreeSequence.time_units`
299
-
attribute: if not known, this defaults to "unknown":
298
+
Another implication of storing node times rather than branch lengths is that `tskit`
299
+
trees are always directional (i.e. they are "rooted"). The reason that `tskit` stores
300
+
times of nodes (rather than e.g. genetic distances between them) is to ensure temporal
301
+
consistency. In particular it makes it impossible for a node to be an ancestor of a
302
+
node in one tree, and a descendant of the same node in another tree in the tree sequence.
303
+
This is of critical importance when extending the concept of genetic ancestry to
304
+
{ref}`sec_phylogen_multiple_trees` along a genome.
305
+
306
+
The {attr}`TreeSequence.time_units` attribute stores the units in which time is
307
+
measured: if not known, this defaults to "unknown":
300
308
301
309
```{code-cell}
302
310
print("Time units are", tree.tree_sequence.time_units)
303
311
tree.draw_svg(y_axis=True)
304
312
```
305
313
306
-
The fact that nodes have times also means that `tskit` does not explictly store
307
-
branch lengths: a branch length is simply the difference between the time of a
308
-
node and the time of its parent node. For convenience, however, `tskit` provides a
314
+
Although branch lengths are not stored explicitly, for convenience `tskit` provides a
309
315
{meth}`Tree.branch_length` method:
310
316
311
317
```{code-cell}
@@ -412,30 +418,87 @@ has not been simulated.
412
418
## Phylogenetic methods
413
419
414
420
:::{todo}
415
-
Demo some phylo methods. e.g.
421
+
Demo some phylogenetic methods. e.g.
416
422
1. Total branch length - demo quick calculation across multiple trees - incremental algorithm used extensively in population genetics. ("bringing tree thinking to popgen").
417
423
2. KC distance
418
424
3. Balance metrics
419
-
4. Topology rankings
425
+
4. Topology rankings (see https://github.com/tskit-dev/tutorials/issues/93)
420
426
:::
421
427
422
428
423
429
(sec_phylogen_unified_structure)=
424
430
## Storing and accessing genetic data
425
431
426
-
:::{todo}
427
-
Add content and link to {ref}`sec_what_is_dna_data`.
428
-
:::
432
+
`Tskit` has been designed to capture both evolutionary tree topologies and the genetic
433
+
sequences that evolve along the branches of these trees. This is achieved by defining
434
+
{ref}`sec_terminology_mutations_and_sites` which are associated with specific positions
435
+
along the genome.
436
+
437
+
```{code-cell}
438
+
import msprime # The `msprime` package can throw mutations onto a tree sequence
0 commit comments