-
Notifications
You must be signed in to change notification settings - Fork 1
Component_1 (Advanced Metrics Extractor)
Component 1 ( Advanced Metrics Extractor from now on) is used in order to extract complex informations from the input-given BPMN model. For the project's scope, a metric is defined as advanced if it is derived by computations and/or aggregation of basic metrics and on the model's elements.
This type of metrics is useful to have an in-depth analysis of the model, exposing a wide variety of informations, ranging from graphs theories related metrics to proportions between model elements.
For this, and because all of those metrics are derived from the works of BPM experts from all over the world, we will be dividing them according to the papers we used as source for our studies and implementations.
- Applying software metrics to evaluate business process models
- Control-flow complexity measurement of processes and weyuker’s properties
- A Discourse on Complexity of Process Models
- Prediction Models for BPMN Usability and Maintainability
- On a quest for good process models: the cross-connectivity metric
- Quality metrics for business process modeling
- Adopting the Cognitive Complexity Measure for Business Process Models
- Complexity metrics for business process models
- Cohesion and coupling metrics for workflow process design
- Finding a complexity measure for business process models
- Metrics for Process Models
- What makes process models understandable?
- Complexity metrics for workflow nets
- Proposal of square metrics for measuring business process model complexity
- Investigating layout complexity
- Guidelines on the aesthetic quality of UML class diagrams
by Rolón E, Ruiz F, García F, Piattini M (2006)
Some of the metrics appearing in this paper are the same as some of the metrics extracted by the Basic Metrics Extractor, but, for the sake of completeness, we decided to refer them anyway. There's not much to say about them, as they are already self explanatory.
- TNT: total number of Tasks
- TNCS: total number of Collapsed Subprocess
- TNA: total number of Activities
- TNDO: total number of Data Objects
- TNG: total number of Gateways
- TNEE: total number of End Events
- TNIE: total number of Intermediate Events
- TNSE: total number of Start Events
- TNE: total number of Events
- TNSF: total number of Sequence Flows
Two other metrics measure the connectivity level of precise elements in the model, in particular activities and partecipants (pools). This value is given by the division between the number of the elements and the number of every flows that connects them.
- CLA: connectivity level between activities (TNA/NSFA)
- CLP: connectivity level between partecipants (NMF/NP)
The last four metrics measure various kind of proportions between elements of the model.
- PDOPin: proportion of data objects as incoming products and total data objects (NDOIn/TNDO)
- PDOPout: proportion of data objects as outgoing products and total data objects (NDOOut/TNDO)
- PDOTOut: proportion of data objects as outgoing product of activities of the model (NDOOut/TNT)
- PLT: proportion of pools/lanes and activities (NL/TNT)
by J. Cardoso (2007, doi = {10.1007/11837862_13})
Cardoso's first metric is the Control-Flow Complexity. It represents a weighted sum of all connectors that are used in a process model. In particular:
- every Exclusive (split) Gateway's value corresponds to the number of its outgoing flows;
- every Inclusive (split) Gateway's value corresponds to 2^n - 1, where n is the number of its outgoing flows;
- every Parallel (split) Gateway's value corresponds to 1.
The other types of Gateway are not covered in the original source, so they haven't been considered. The complexity value affects the readbility, the maintanability, the reliability and other proprieties of the model.
- CFC: control-flow Complexity
by J. Cardoso, J. Mendling, G. Neumann, H.A. Reijers (2006, doi = {10.1007/11837862_13}), contained in Business Process Management Workshops by Johann Eder, Schahram Dustdar (Eds.), chapter 13, pag. 117-128
Three metrics of this paper based on the number of Activities and Gateways in the model.
- NOA: number of Activities
- NOAC: number of Activities and Control-Flow
- NOAJS: number of Activities, Joins and Splits
Three other metrics are based on the works of Halstead, whose measures are among the most important in the field of software complexity. Those metrics are based on four values. We report their original meaning and the meaning in the BPMN field:
- n1 = number of unique operators => number of unique activities and control-flow elements
- n2 = number of unique operands => number of unique data variables
- N1 = total number of operator occurrences => total number of activities and control-flow elements
- N2 = total number of operand occurrences => total number of data variables
From those numbers, we can get to the Halstead-based Process Complexity (HPC) measures for process' length, volume and difficulty. They are calculated as follows:
Process Length: N = n1*log2(n1) + n2*log2(n2)Process Volume: V = (N1+N2)*log2(n1+n2)Process Difficulty: D = (n1/2)*(N2/n2)
Thus we get three metrics:
- HPC_D: Halstead-based Process Complexity (process difficulty)
- HPC_N: Halstead-based Process Complexity (process lenght)
- HPC_V: Halstead-based Process Complexity (process volume)
The paper also discusses about a software complexity metric that is based on the impact of the information flow in a program’s structure. This is adapted to evaluate the complexity of processes in BPM, obtaining the Interface Complexity (IC), that is defined as:
IC = Length * (number of inputs * number of ouputs)^2
During the calculation of a software's complexity, length represents the number lines of code (LOC), and the number of inputs/outputs are represent the the flows of local information entering/leaving. For BPM models instead, the length of an activity is 1 if it is a black box, while it is represented by its LOC if it's a white box (we always consider activities as white box, so length is actually the same as the number of activities in the model): the fan-in/out are the number of Data Input/Output Associations. The four metrics that we obtain are:
- NoI: number of Activities inputs (Fan-In)
- NoO: number of Activities outputs (Fan-Out)
- Lenght: Activities lenght (number of activities)
- IC: Interface Complexity of Activities
The last metric discussed in the paper is the NOF, the number of archs present in the model.
- NOF: number of Control Flow connections (number of archs)
by Rolón E, Sanchez L, Garcia F, Ruiz F, Piattini M, Caivano D, Visaggio G (2009, doi = {10.1007/11837862_13})
This paper presents the number of Sequence Flows metrics, that is equal to the NOF metric, and that is already extracted by the Basic Metrics Extractor.
- TNSF: total number of Sequence Flows
by Vanderfeesten I, Reijers HA, Mendling J, van der Aalst WM, Cardoso J (2008, doi = {10.1007/978-3-540-69534-9_36})
The Cross Connectivity metric is used to "measure the strength of the links between process model elements", so to measure the complexity of the mental operations that the reader of the model has to do in order to understand it. It is based on the "weakest-link metaphor", thus what counts the most it's the hardest part to understand in the model. A lower CC value means that they are more prone to include errore, because they are harder to understand. To get to this value, first we calculate the weight of every node in the model. Given the value d to represent the degree of the node (the number of incoming and outgoing flows of the node):
- if the node is an Exclusive Gateway, its weight is
1/d; - if it's an Inclusive Gateway, its weight is
(1 / 2^d - 1) + ((2^d - 2) / (2^d - 1)) * 1 / d; - otherwise, it is 1
The paper does not explicitly take into consideration every form of BPM Element, so we decided to give the type of nodes that were not nominated a weight of 1. After we got the weight of every node, we proceed to calculate the weight of the arcs. The weight of an arc is given by the product of the weight of his source node and the weight of his target node.
W(a) = w(src(a)) · w(dest(a))
With the weight of every arc in the model, we can obtain the value of every path. A path is the sequence of arcs that should be followed to get from a node n1 to a node n2. Its value is the product of the weights of every arc in the path.
v(p) = W(a1) ·W(a2) · ... ·W(ax)
The value of a connection between any given pair of node n1 and n2 is given by the maximum value of the set of paths from node n1 to node n2. In the case where the nodes are not connected, the value of the connection is 0.
V (n1, n2) = max[p∈Pn1,n2]v(p)
Eventually, with the values of the connections between every pair of nodes in the model, we can obtain the Cross-Connectivity value. It is defined as follow:
CC = Sum[n1,n2∈N]V(n1, n2) / (|N| · (|N| − 1))
- CC: Cross-Connectivity
by Khlif W, Makni L, Zaaboub N, Ben-Abdallah H (2009)
The aim of the paper is to provide an adaptation of OO Software metrics for BPMN models. Besides some metric that we already covered, like the Halstead-based ones or the IC, the paper provides the definitions for the Imported Coupling of a Process and the Exported Coupling of a Process. Those metrics are used to provide a quality value that represents the coupling of a model. The ICP and the ECP are given by the sum of every outgoing/incoming flow of each task and/or of each task that is contained in the process (in case of subprocesses).
- ICP: Imported Coupling of a Process
- ECP: Exported Coupling of a Process
by Gruhn V, Laue R (2006, doi = {10.1109/COGINF.2006.365702})
This paper, as others that we have covered, wants to provide a metric that can measure the understandability and the maintainability of a Business Process Model. For this, the authors use as a base studies made on the cognitive weights of programming's basic control structures, and adapt them to BPMN structures, to obtain the Cognitive Weight metric. According to the paper, there are 8 type of structures that can be found in a model, and every one of them has a different weight:
- Sequence: a sequence of simple consecutive steps. Weight: 1
- Exlusive Choice 1: Exclusive split Gateways with 2 branches -> Weight: 2
- Exclusive Choice 2: Exclusive split Gateways with more than 2 branches -> Weight: 3
- Parallel Split and Synchronization: Parallel Gateways -> Weight: 4
- Multiple Choice and Synchonizing Merge: Inclusive Gateways -> Weight: 7
- User-define Function: Subprocesses -> Weight: 2
- Multiple Instances Patterns: Multiple Instance Loop Characteristics -> Weight: 6
- Cancel Activity: Cancel Events -> Weight: 1
According to the paper, there would also be another type of structure, the Cancel Case, a cancellation that deactivates all elements within another part of the model. We could't find anything like this in the BPMN notation, so we decided to not implement it. The sum of each weight of every structure present in the model is it's Cognitive Weight value.
- W: Cognitive Weight
by Gruhn V, Laue R (2006)
The Nesting Depth of a node is "the number of decisions in the control flow that are necessary to perform this action". The authors of the paper state that this value can affect the overall complexity of the model: the greater the nesting depth, the greater is the complexity. This leads us to the two metrics presented in the paper: the Maximum Nesting Depth and the Mean Nesting Depth. For their implementation, we chose to increment the value of the Nesting Depth only in the case of Exclusive and Complex Gateway, and, as this eventuality is not covered in the paper, we decided that, in the case of a node with more than one path that yields a different Nesting Depth value, the adopted value is the minimum among them.
- MaxND: Maximum nesting depth
- MeanND: Mean nesting depth
by Reijers HA, Vanderfeesten IT (2004, doi = {10.1007/978-3-540-25970-1_19})
"The coupling metric determines the number of related activities for each activity." It is given by the number of activities that are connected by a sequence flow to another activity, divided by the number of activities present in the model times the maximal number of activities' coupling (activities - 1)
Sum[s,t∈T]connected(s, t) / (|T|*(|T|-1)), where T is the set of the activities in the model
- CP: Coupling
by Latva-Koivisto AM (2001)
- CNC: Coefficient of Network Complexity or Connectivity coefficient
by Jan Mendling (2008), chapter 4
This book is probabily the most complete and precise of sources concerning BPMN models' analysis and metrics; some of what we could consider the "main" metrics come indeed from this book. It is mostly based on viewing and analysing BPMN models as graph, thus some degree of Graph Theory is obviously involved. We'll divide them following the same structure of the book (and thus of our classes).
The Size of a model is simply the number of nodes that are present in it. A model with a larger size than others is more likely to contain a greater number of error than those other ones.
- Sn: size
The Diameter of a model is "the length of the longest path from a start node to an end node". As for the Size, a model with a greater diameter is more likely to contain errors than others with smaller diamters.
- diam: diameter
The Density of a model is directly proportional to its error probability. It can be obtained by the number of archs (flows) divided by the number of nodes times the number of nodes minus 1.
Δ(G) = |A| / (|N| * (|N| - 1)), where A is the set of the archs and N the set of the nodes
- Δ(G): density
The Sequentialityrepresents the presence of simple consecutive nodes in the model, this being the easiest structure that a model can possibly present. A process with high Sequentiality should be less likely prone to have errors. It can be obtained as follows:
Ξ(G) = |A ∩ {(T × T )}| / |A| -> number of arcs between noneconnector nodes divided by the number of arcs
If every arc connects only non-connector nodes, the Sequentiality is 1.
- Ξ(G): sequentiality
The Depth of a node n is based on two values: the in-depth value λin(n) and the out-depth value λout(n). They represent respectively the number of split nodes less the number of join node, and viceversa, that have to be visited in order to get to n. Of course, the in-depth value is based on the value of the predecessor's node, and the out-depth and the one of its successor. The depth λ(n) of a node is the minimum between its two depths, and the depth ^ of a model is the maximum depth among the ones of his nodes. The higher the depth, higher the probability of errors in the model.
- ^: depth
The Connector Mismatch metric gives the number of gateways' mismatches in the model. Of course, a model with an high Connector Mismatch value is likely to include more errors, because of the possible problems that can be generated with the handling of the tokens.
- MM: connector mismatch
The Connector Heterogeneity metrics gives the entropy over the connector types; in other words, it refers to how much different type of gateways are used in the model. An higher Connector Heterogeneity can lead to higher chances of mismatches, thus more errors. This value can be obtained as follows:
CH(G) = −SUM[g∈{and,xor,or}] p(g) * log3(p(g)), where p is the frequency of presence of the gateway
- CH: connector heterogeneity
The Cyclicity Metrics relates to the number of nodes in cycles eventually present in the model. It is given by:
CYC = |Nc|/|N|, where Nc are the number of nodes that are part of cycles and N is the number of nodes.
In a model with no cycle, the CYC value is 0. An higher value means an higher risk of errors.
- CYC: cyclicity
The Token Split metric sums the outgoing flows of Inclusive and Parallel Gateways, those being the only gateways that can originate concurrency in a model, and then decreases the result by one.
SUM[c∈Cor∪Cand]dout(n) − 1, where dout is the number of outgoing flows of a node.
A model with higher concurrency value is more likely to have errors.
- TS: concurrency
by Mendling J, Reijers HA, Cardoso J (2007, doi = {10.1007/978-3-540-75183-0_4}), contained in Business Process Management, by Gustavo Alonso and Peter Dadam, Michael Rosemann (Eds.)
The Average Connector Degree is defined as the average incoming and outgoing sequence flows of all gateways and activities with at least two incoming or outgoing sequence flows.
- ACD: Average Connector Degree or Average Gayeway Degree
The Maximum Connector Degree is defined as the sum of the incoming and outgoing sequence flows of the gateway or activity with the most incoming and outgoing sequence flows.
- MCD: Maximum Degree of a Connector or Maximum Gateway Degree
by Lassen KB, van der Aalst WM (2009, doi = {10.1016/j.infsof.2008.08.005})
The authors of this paper base their work on representing BP models as Petri Nets. As such, the Extended Cardoso Metrics is the same as the Control-Flow Complexity metric, but applied to Petri Nets. This, in our case, doesn't actually change anything, so they have the same values. This metric "penalizes each state by how many direct successor states it induces", thus obtaining a value that is linked to the model's complexity.
- ECaM: Extended Cardoso Metric
The Extended Cyclomatic Metrics is given by the number of Sequence Flows minus the number of Flow Nodes plus the number of strongly connected components in the model. We obtain this last value applying Tarjan's Algorithm for Strongly Connected Components. The aim of this metric is to evaluate the complexity of the behaviour that the model exhibits, "considering which states can the process be in and what transitions may occur".
- ECyM: Extended Cyclomatic Metric
by Kluza K, Nalepa GJ (2012)
The aim of the next two metrics is to provide metrics that take into account the types of process elements and their number, as opposed to the more simple ones. The Durfee Square Metric "equals d if there are d types of elements which occur at least d times in the model (each), and the other types of elements occur no more than d times (each)".
- DSM: Durfee Square Metric
The Perfect Square Metrics, quoting the paper, is defined as follows: "given a set of element types ranked in decreasing order of the number of their instances, the PSM is the (unique) largest number such that the top p types occur (together) at least p^2 times".
- PSM: Perfect Square Metric
by Comber T, Maltby JR (1996)
This metric aims to measure the complexity of the model's graphical layout. It is based on Bonsiepe's Technique, that consists in drawing contour lines around every type of object, getting the complexity value C with the proportion of the objects in each class. It is obtained as follows:
C = -N SUM[i = 1]p_i * log2(p_i) and p_i = n_i / n
where:
- N = total number of objects (widths or heights, distance from top or side of page)
- n = number of classes (number of unique widths, heights or distances)
- n_i = number of objects in the ith class
- p_i = proportion of the ith class
Thus we get the Layout Complexity metric.
- Layout_complexity
by Eichelberger H, Schmid K (2009, doi = {10.1016/j.infsof.2009.04.008})
The purpose of this metric is to give the model a measure based on its understability and readability, the higher the value is, the harder it will be for an external user to read and comprehend the model in a correct way.
The metric is calculated in the following way:
Layout_measure = ni + nr + os
where:
- ni = number of intersecting sequence flows
- nr = number of non-rectilinear sequence flows
- os = number of overlapping shapes
Hence we get the Layout Measure metric.
- Layout_measure
Wiki
Component Architecture
Technology Used
Tests