diff --git a/docs/concatenating_thickets.rst b/docs/concatenating_thickets.rst new file mode 100644 index 00000000..d7a294c1 --- /dev/null +++ b/docs/concatenating_thickets.rst @@ -0,0 +1,64 @@ +.. + Copyright 2022 Lawrence Livermore National Security, LLC and other + Thicket Project Developers. See the top-level LICENSE file for details. + + SPDX-License-Identifier: MIT + +*************************** + Concatenating Thickets +*************************** + +Thicket does **not** implement any native profile readers (meaning readers that read performance profiles directly into Thicket objects). All readers used in Thicket use Hatchet readers to read each performance profile into a :code:`Hatchet.GraphFrame` (Fig 1B), which is then converted to a :code:`Thicket.Thicket` object containing one profile using the :code:`Thicket.thicketize_graphframe()` function (Fig 1C). The one profile Thickets are then concatenated into a single Thicket object using the :code:`Thicket.concat_thickets()` function (Fig 1D), which is capable of concatenating a set of n Thickets, T: + +.. math:: + T = \{t_1, t_2, ..., t_n\} + +where each :math:`t_i` (:math:`i = 1, 2, ..., n`) has number of profiles :math:`p_i >= 1`. + +.. figure:: images/reading_thickets.png + :align: center + + Figure 1: Example of the Thicket reader process for 4 Caliper profiles. + +| + +The process of concatenating Thickets requires the unification of some structures like the :code:`Thicket.graph` (calltree), which is explained below. + +################## +Unifying Thickets +################## + +This section mainly refers to the :code:`Thicket.Ensemble._unify()` function. + +=================== +Unifying Calltrees +=================== + +*Unifying Calltrees* is the process of performing a **graph operation** (e.g. :code:`Hatchet.graph.union()`) on multiple :code:`Thicket.graph`'s. Comparing two graphs involves comparing :code:`Hatchet.Node` objects between the graphs. The :code:`Hatchet.graph.union()` function computes the union graph between two Hatchet graphs. For the union, nodes are compared by: + +1. `Their depth in the tree `_ - :code:`Node._depth`. +2. `Their frame `_ ("name" and "type") - :code:`Node.frame._tuple_repr` + +Nodes that match in #1 and #2 are merged in the resulting union graph as a new :code:`Hatchet.Node` object (`deep copy of the first node `_). Deep copies of nodes that do **not** match are inserted into the union graph at the appropriate depth. + +*Note:* Comparing nodes with the equality operator (:code:`==`) is not sufficient, as the equality operator only compares the :code:`Node._hatchet_nid`, which is not the same as the above comparison. + +*Note:* The :code:`Thicket.intersection()` function first applies the :code:`Hatchet.graph.union()` before computing the intersection of the graphs, since their does not exist a :code:`Hatchet.graph.intersection()` function. + +====================== +Updating Node Objects +====================== + +Because Node objects must be identical between Thicket components (see :ref:`/thicket_properties.rst#nodes`), The resulting new nodes in the union graph must replace the old node objects in components like the :code:`Thicket.dataframe.index` (see `code `_). The :code:`Hatchet.graph.union()` function provides a dictionary mapping old nodes to new nodes, however to avoid applying these updates after every union between two graphs, we `update a dictionary of all the node mappings `_ and apply the updates after all of the unions have been computed. This is **only** necessary when concatenating more than **two** Thickets, as only one union will be performed when concatenating two Thickets. We `apply this idea when reading files `_ to avoid this cost. + +#################### +Index Concatenation +#################### + +*Index Concatenation* refers to the process that happens for the performance and metadata tables. We concatenate the tables, which is essentially "stacking the rows on top of each other". Because we check that the performance profiles we concatenate are unique (:ref:`/thicket_properties.rst#profiles`), we do not need to worry about duplicate indices in either table. We sort the index of both tables, which interleaves the profiles in the MultiIndex of the performance table to visually group all of the profiles in the table for each node. An example of this operation can be seen in the :ref:`/thicket_tutorial.ipynb`, when :code:`axis="columns"`. + +##################### +Column Concatenation +##################### + +*Column Concatenation* refers to the process that happens in the performance, metadata, and statistics tables. We create a MultiIndex out of the columns, such that for each metric, there is a higher level index label. An example of this operation can be seen in the :ref:`/thicket_tutorial.ipynb`, when :code:`axis="columns"`. \ No newline at end of file diff --git a/docs/images/reading_thickets.png b/docs/images/reading_thickets.png new file mode 100644 index 00000000..8bab652e Binary files /dev/null and b/docs/images/reading_thickets.png differ diff --git a/docs/index.rst b/docs/index.rst index cd1ad1ea..5589efde 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -64,6 +64,8 @@ If you encounter bugs while using thicket, you can report them by opening an iss :caption: Developer Docs developer_guide + thicket_properties + concatenating_thickets .. toctree:: :maxdepth: 2 diff --git a/docs/thicket_properties.rst b/docs/thicket_properties.rst new file mode 100644 index 00000000..133063b6 --- /dev/null +++ b/docs/thicket_properties.rst @@ -0,0 +1,50 @@ +.. + Copyright 2022 Lawrence Livermore National Security, LLC and other + Thicket Project Developers. See the top-level LICENSE file for details. + + SPDX-License-Identifier: MIT + +################# + Thicket Properties +################# + +Thicket compositional operations assume certain properties about the state of the Thicket and its components. We check properties about the Thicket and its components after operations to ensure the Thicket is in a valid state using utility functions `thicket/utils.py`. + +Nodes +===== + +:code:`hatchet.Node` objects represent regions from the executed program. The Thicket components that contain *Nodes* are: + + - :code:`Thicket.graph` + - :code:`Thicket.dataframe` + - :code:`Thicket.statsframe.graph` + - :code:`Thicket.statsframe.dataframe` + +1. :code:`utils.validate_nodes` - *Node* objects are identical between components. :code:`id(node1) == id(node2)`. + + - The :code:`Thicket.statsframe.graph` is the :code:`Thicket.graph`, so this is implicit. + +Profiles +========= + +A *profile* in Thicket is a unique identifier, which is directly mapped to the performance "profile" it represents (:code:`Thicket.profile_mapping`). The *profile* may either be an integer or a tuple. The Thicket components that contain profiles are: + + - :code:`Thicket.profile` + - :code:`Thicket.profile_mapping` + - :code:`Thicket.dataframe` + - :code:`Thicket.metadata` + +1. :code:`utils.validate_profile._validate_all_same` - *profiles* are **equal**. :code:`profile1 == profile2`. +2. :code:`utils.validate_profile._validate_no_duplicates` - There are no duplicate *profiles* in any component. +3. :code:`utils.validate_profile._validate_multiindex_column` - :code:`Thicket.dataframe` and :code:`Thicket.metadata` must both contain :code:`pd.MultiIndex` columns, if either one does. + + - If the columns are *MultiIndex*, the *profiles* are tuples, otherwise the *profiles* are integers. + +Performance Data +================== + +The :code:`Thicket.dataframe` contains the performance data and is checked for the following properties: + +1. :code:`utils.validate_dataframe._check_duplicate_inner_idx` - There are no duplicate indices. +2. :code:`utils.validate_dataframe._check_missing_hnid` - *Node* objects, identified by their :code:`_hatchet_nid` are in ascending order without gaps. +3. :code:`utils.validate_dataframe._validate_name_column` - The values in the "name" column match the :code:`Node.frame["name"]` attribute for that row or are :code:`None`. \ No newline at end of file