Component_1 (Advanced Metrics Extractor)

Advanced Metrics Extractor

Component 1 ( Advanced Metrics Extractor from now on) is used in order to extract complex informations from the input-given BPMN model. For the project's scope, a metric is defined as advanced if it is derived by computations and/or aggregation of basic metrics and on the model's elements.

This type of metrics is useful to have an in-depth analysis of the model, exposing a wide variety of informations, ranging from graphs theories related metrics to proportions between model elements.

For this, and because all of those metrics are derived from the works of BPM experts from all over the world, we will be dividing them according to the papers we used as source for our studies and implementations.

Metrics from "Applying software metrics to evaluate business process models"

by Rolón E, Ruiz F, García F, Piattini M (2006)

Some of the metrics appearing in this paper are the same as some of the metrics extracted by the Basic Metrics Extractor, but, for the sake of completeness, we decided to refer them anyway. There's not much to say about them, as they are already self explanatory.

TNT: total number of Tasks
TNCS: total number of Collapsed Subprocess
TNA: total number of Activities
TNDO: total number of Data Objects
TNG: total number of Gateways
TNEE: total number of End Events
TNIE: total number of Intermediate Events
TNSE: total number of Start Events
TNE: total number of Events
TNSF: total number of Sequence Flows

Two other metrics measure the connectivity level of precise elements in the model, in particular activities and partecipants (pools). This value is given by the division between the number of the elements and the number of every flows that connects them.

CLA: connectivity level between activities (TNA/NSFA)
CLP: connectivity level between partecipants (NMF/NP)

The last four metrics measure various kind of proportions between elements of the model.

PDOPin: proportion of data objects as incoming products and total data objects (NDOIn/TNDO)
PDOPout: proportion of data objects as outgoing products and total data objects (NDOOut/TNDO)
PDOTOut: proportion of data objects as outgoing product of activities of the model (NDOOut/TNT)
PLT: proportion of pools/lanes and activities (NL/TNT)

Metrics from "Control-flow complexity measurement of processes and weyuker’s properties"

by J. Cardoso (2007, doi = {10.1007/11837862_13})

Cardoso's first metric is the Control-Flow Complexity. It represents a weighted sum of all connectors that are used in a process model. In particular:

every Exclusive (split) Gateway's value corresponds to the number of its outgoing flows;
every Inclusive (split) Gateway's value corresponds to 2^n - 1, where n is the number of its outgoing flows;
every Parallel (split) Gateway's value corresponds to 1.

The other types of Gateway are not covered in the original source, so they haven't been considered. The complexity value affects the readbility, the maintanability, the reliability and other proprieties of the model.

CFC: control-flow Complexity

Metrics from "A Discourse on Complexity of Process Models"

by J. Cardoso, J. Mendling, G. Neumann, H.A. Reijers (2006, doi = {10.1007/11837862_13}), contained in Business Process Management Workshops by Johann Eder, Schahram Dustdar (Eds.), chapter 13, pag. 117-128

Three metrics of this paper based on the number of Activities and Gateways in the model.

NOA: number of Activities
NOAC: number of Activities and Control-Flow
NOAJS: number of Activities, Joins and Splits

Three other metrics are based on the works of Halstead, whose measures are among the most important in the field of software complexity. Those metrics are based on four values. We report their original meaning and the meaning in the BPMN field:

n1 = number of unique operators => number of unique activities and control-flow elements
n2 = number of unique operands => number of unique data variables
N1 = total number of operator occurrences => total number of activities and control-flow elements
N2 = total number of operand occurrences => total number of data variables

From those numbers, we can get to the Halstead-based Process Complexity (HPC) measures for process' length, volume and difficulty. They are calculated as follows:

Process Length: N = n1*log2(n1) + n2*log2(n2)
Process Volume: V = (N1+N2)*log2(n1+n2)
Process Difficulty: D = (n1/2)*(N2/n2)

Thus we get three metrics:

HPC_D: Halstead-based Process Complexity (process difficulty)
HPC_N: Halstead-based Process Complexity (process lenght)
HPC_V: Halstead-based Process Complexity (process volume)

The paper also discusses about a software complexity metric that is based on the impact of the information flow in a program’s structure. This is adapted to evaluate the complexity of processes in BPM, obtaining the Interface Complexity (IC), that is defined as:

IC = Length * (number of inputs * number of ouputs)^2

During the calculation of a software's complexity, length represents the number lines of code (LOC), and the number of inputs/outputs are represent the the flows of local information entering/leaving. For BPM models instead, the length of an activity is 1 if it is a black box, while it is represented by its LOC if it's a white box (we always consider activities as white box, so length is actually the same as the number of activities in the model): the fan-in/out are the number of Data Input/Output Associations. The four metrics that we obtain are:

NoI: number of Activities inputs (Fan-In)
NoO: number of Activities outputs (Fan-Out)
Lenght: Activities lenght (number of activities)
IC: Interface Complexity of Activities

The last metric discussed in the paper is the NOF, the number of archs present in the model.

NOF: number of Control Flow connections (number of archs)

Metrics from "Prediction Models for BPMN Usability and Maintainability"

by Rolón E, Sanchez L, Garcia F, Ruiz F, Piattini M, Caivano D, Visaggio G (2009, doi = {10.1007/11837862_13})

This paper presents the number of Sequence Flows metrics, that is equal to the NOF metric, and that is already extracted by the Basic Metrics Extractor.

TNSF: total number of Sequence Flows

Metrics from "On a quest for good process models: the cross-connectivity metric"

by Vanderfeesten I, Reijers HA, Mendling J, van der Aalst WM, Cardoso J (2008, doi = {10.1007/978-3-540-69534-9_36})

The Cross Connectivity metric is used to "measure the strength of the links between process model elements", so to measure the complexity of the mental operations that the reader of the model has to do in order to understand it. It is based on the "weakest-link metaphor", thus what counts the most it's the hardest part to understand in the model. A lower CC value means that they are more prone to include errore, because they are harder to understand. To get to this value, first we calculate the weight of every node in the model. Given the value d to represent the degree of the node (the number of incoming and outgoing flows of the node):

if the node is an Exclusive Gateway, its weight is 1/d;
if it's an Inclusive Gateway, its weight is (1 / 2^d - 1) + ((2^d - 2) / (2^d - 1)) * 1 / d;
otherwise, it is 1

The paper does not explicitly take into consideration every form of BPM Element, so we decided to give the type of nodes that were not nominated a weight of 1. After we got the weight of every node, we proceed to calculate the weight of the arcs. The weight of an arc is given by the product of the weight of his source node and the weight of his target node.

W(a) = w(src(a)) · w(dest(a))

With the weight of every arc in the model, we can obtain the value of every path. A path is the sequence of arcs that should be followed to get from a node n1 to a node n2. Its value is the product of the weights of every arc in the path.

v(p) = W(a1) ·W(a2) · ... ·W(ax)

The value of a connection between any given pair of node n1 and n2 is given by the maximum value of the set of paths from node n1 to node n2. In the case where the nodes are not connected, the value of the connection is 0.

V (n1, n2) = max[p∈Pn1,n2]v(p)

Eventually, with the values of the connections between every pair of nodes in the model, we can obtain the Cross-Connectivity value. It is defined as follow:

CC = Sum[n1,n2∈N]V(n1, n2) / (|N| · (|N| − 1))

CC: Cross-Connectivity

Metrics from "Quality metrics for business process modeling"

by Khlif W, Makni L, Zaaboub N, Ben-Abdallah H (2009)

The aim of the paper is to provide an adaptation of OO Software metrics for BPMN models. Besides some metric that we already covered, like the Halstead-based ones or the IC, the paper provides the definitions for the Imported Coupling of a Process and the Exported Coupling of a Process. Those metrics are used to provide a quality value that represents the coupling of a model. The ICP and the ECP are given by the sum of every outgoing/incoming flow of each task and/or of each task that is contained in the process (in case of subprocesses).

ICP: Imported Coupling of a Process
ECP: Exported Coupling of a Process

Metrics from "Adopting the Cognitive Complexity Measure for Business Process Models"

by Gruhn V, Laue R (2006, doi = {10.1109/COGINF.2006.365702})

This paper, as others that we have covered, wants to provide a metric that can measure the understandability and the maintainability of a Business Process Model. For this, the authors use as a base studies made on the cognitive weights of programming's basic control structures, and adapt them to BPMN structures, to obtain the Cognitive Weight metric. According to the paper, there are 8 type of structures that can be found in a model, and every one of them has a different weight:

Sequence: a sequence of simple consecutive steps. Weight: 1
Exlusive Choice 1: Exclusive split Gateways with 2 branches -> Weight: 2
Exclusive Choice 2: Exclusive split Gateways with more than 2 branches -> Weight: 3
Parallel Split and Synchronization: Parallel Gateways -> Weight: 4
Multiple Choice and Synchonizing Merge: Inclusive Gateways -> Weight: 7
User-define Function: Subprocesses -> Weight: 2
Multiple Instances Patterns: Multiple Instance Loop Characteristics -> Weight: 6
Cancel Activity: Cancel Events -> Weight: 1

According to the paper, there would also be another type of structure, the Cancel Case, a cancellation that deactivates all elements within another part of the model. We could't find anything like this in the BPMN notation, so we decided to not implement it. The sum of each weight of every structure present in the model is it's Cognitive Weight value.

W: Cognitive Weight

Metrics from "Complexity metrics for business process models"

by Gruhn V, Laue R (2006)

The Nesting Depth of a node is "the number of decisions in the control flow that are necessary to perform this action". The authors of the paper state that this value can affect the overall complexity of the model: the greater the nesting depth, the greater is the complexity. This leads us to the two metrics presented in the paper: the Maximum Nesting Depth and the Mean Nesting Depth. For their implementation, we chose to increment the value of the Nesting Depth only in the case of Exclusive and Complex Gateway, and, as this eventuality is not covered in the paper, we decided that, in the case of a node with more than one path that yields a different Nesting Depth value, the adopted value is the minimum among them.

MaxND: Maximum nesting depth
MeanND: Mean nesting depth

Metrics from "Cohesion and coupling metrics for workflow process design"

by Reijers HA, Vanderfeesten IT (2004, doi = {10.1007/978-3-540-25970-1_19})

"The coupling metric determines the number of related activities for each activity." It is given by the number of activities that are connected by a sequence flow to another activity, divided by the number of activities present in the model times the maximal number of activities' coupling (activities - 1)

Sum[s,t∈T]connected(s, t) / (|T|*(|T|-1)), where T is the set of the activities in the model

CP: Coupling

Metrics from "Finding a complexity measure for business process models"

by Latva-Koivisto AM (2001)

CNC: Coefficient of Network Complexity or Connectivity coefficient

Metrics from "Metrics for Process Models"

by Jan Mendling (2008), chapter 4

This book is probabily the most complete and precise of sources concerning BPMN models' analysis and metrics; some of what we could consider the "main" metrics come indeed from this book. It is mostly based on viewing and analysing BPMN models as graph, thus some degree of Graph Theory is obviously involved. We'll divide them following the same structure of the book (and thus of our classes).

Size Metrics

The Size of a model is simply the number of nodes that are present in it. A model with a larger size than others is more likely to contain a greater number of error than those other ones.

Sn: size

The Diameter of a model is "the length of the longest path from a start node to an end node". As for the Size, a model with a greater diameter is more likely to contain errors than others with smaller diamters.

diam: diameter

Density Metrics

The Density of a model is directly proportional to its error probability. It can be obtained by the number of archs (flows) divided by the number of nodes times the number of nodes minus 1.

Δ(G) = |A| / (|N| * (|N| - 1)), where A is the set of the archs and N the set of the nodes

Δ(G): density

Partitionability Metrics

The Sequentialityrepresents the presence of simple consecutive nodes in the model, this being the easiest structure that a model can possibly present. A process with high Sequentiality should be less likely prone to have errors. It can be obtained as follows:

Ξ(G) = |A ∩ {(T × T )}| / |A| -> number of arcs between noneconnector nodes divided by the number of arcs

If every arc connects only non-connector nodes, the Sequentiality is 1.

Ξ(G): sequentiality

The Depth of a node n is based on two values: the in-depth value λin(n) and the out-depth value λout(n). They represent respectively the number of split nodes less the number of join node, and viceversa, that have to be visited in order to get to n. Of course, the in-depth value is based on the value of the predecessor's node, and the out-depth and the one of its successor. The depth λ(n) of a node is the minimum between its two depths, and the depth ^ of a model is the maximum depth among the ones of his nodes. The higher the depth, higher the probability of errors in the model.

^: depth

Connector Interplay Metrics

The Connector Mismatch metric gives the number of gateways' mismatches in the model. Of course, a model with an high Connector Mismatch value is likely to include more errors, because of the possible problems that can be generated with the handling of the tokens.

MM: connector mismatch

The Connector Heterogeneity metrics gives the entropy over the connector types; in other words, it refers to how much different type of gateways are used in the model. An higher Connector Heterogeneity can lead to higher chances of mismatches, thus more errors. This value can be obtained as follows:

CH(G) = −SUM[g∈{and,xor,or}] p(g) * log3(p(g)), where p is the frequency of presence of the gateway

CH: connector heterogeneity

Cyclicity Metrics

The Cyclicity Metrics relates to the number of nodes in cycles eventually present in the model. It is given by:

CYC = |Nc|/|N|, where Nc are the number of nodes that are part of cycles and N is the number of nodes.

In a model with no cycle, the CYC value is 0. An higher value means an higher risk of errors.

CYC: cyclicity

Concurrency Metrics

The Token Split metric sums the outgoing flows of Inclusive and Parallel Gateways, those being the only gateways that can originate concurrency in a model, and then decreases the result by one.

SUM[c∈Cor∪Cand]dout(n) − 1, where dout is the number of outgoing flows of a node.

A model with higher concurrency value is more likely to have errors.

TS: concurrency

Metrics from "What makes process models understandable?"

by Mendling J, Reijers HA, Cardoso J (2007, doi = {10.1007/978-3-540-75183-0_4}), contained in Business Process Management, by Gustavo Alonso and Peter Dadam, Michael Rosemann (Eds.)

The Average Connector Degree is defined as the average incoming and outgoing sequence flows of all gateways and activities with at least two incoming or outgoing sequence flows.

ACD: Average Connector Degree or Average Gayeway Degree

The Maximum Connector Degree is defined as the sum of the incoming and outgoing sequence flows of the gateway or activity with the most incoming and outgoing sequence flows.

MCD: Maximum Degree of a Connector or Maximum Gateway Degree

Metrics from "Complexity metrics for Workflow nets"

by Lassen KB, van der Aalst WM (2009, doi = {10.1016/j.infsof.2008.08.005})

The authors of this paper base their work on representing BP models as Petri Nets. As such, the Extended Cardoso Metrics is the same as the Control-Flow Complexity metric, but applied to Petri Nets. This, in our case, doesn't actually change anything, so they have the same values. This metric "penalizes each state by how many direct successor states it induces", thus obtaining a value that is linked to the model's complexity.

ECaM: Extended Cardoso Metric

The Extended Cyclomatic Metrics is given by the number of Sequence Flows minus the number of Flow Nodes plus the number of strongly connected components in the model. We obtain this last value applying Tarjan's Algorithm for Strongly Connected Components. The aim of this metric is to evaluate the complexity of the behaviour that the model exhibits, "considering which states can the process be in and what transitions may occur".

ECyM: Extended Cyclomatic Metric

Metrics from "Proposal of square metrics for measuring business process model complexity"

by Kluza K, Nalepa GJ (2012)

The aim of the next two metrics is to provide metrics that take into account the types of process elements and their number, as opposed to the more simple ones. The Durfee Square Metric "equals d if there are d types of elements which occur at least d times in the model (each), and the other types of elements occur no more than d times (each)".

DSM: Durfee Square Metric

The Perfect Square Metrics, quoting the paper, is defined as follows: "given a set of element types ranked in decreasing order of the number of their instances, the PSM is the (unique) largest number such that the top p types occur (together) at least p^2 times".

PSM: Perfect Square Metric

Metrics from "Investigating layout complexity"

by Comber T, Maltby JR (1996)

This metric aims to measure the complexity of the model's graphical layout. It is based on Bonsiepe's Technique, that consists in drawing contour lines around every type of object, getting the complexity value C with the proportion of the objects in each class. It is obtained as follows:

C = -N SUM[i = 1]p_i * log2(p_i) and p_i = n_i / n

where:

N = total number of objects (widths or heights, distance from top or side of page)
n = number of classes (number of unique widths, heights or distances)
n_i = number of objects in the ith class
p_i = proportion of the ith class

Thus we get the Layout Complexity metric.

Layout_complexity

Metrics from "Guidelines on the aesthetic quality of UML class diagrams"

by Eichelberger H, Schmid K (2009, doi = {10.1016/j.infsof.2009.04.008})

The purpose of this metric is to give the model a measure based on its understability and readability, the higher the value is, the harder it will be for an external user to read and comprehend the model in a correct way.

The metric is calculated in the following way:

Layout_measure = ni + nr + os

where:

ni = number of intersecting sequence flows
nr = number of non-rectilinear sequence flows
os = number of overlapping shapes

Hence we get the Layout Measure metric.

Layout_measure

Wiki

Home

Component Architecture

Technology Used

Camunda

Tests

Tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component_1 (Advanced Metrics Extractor)

Advanced Metrics Extractor

Metrics from "Applying software metrics to evaluate business process models"

Metrics from "Control-flow complexity measurement of processes and weyuker’s properties"

Metrics from "A Discourse on Complexity of Process Models"

Metrics from "Prediction Models for BPMN Usability and Maintainability"

Metrics from "On a quest for good process models: the cross-connectivity metric"

Metrics from "Quality metrics for business process modeling"

Metrics from "Adopting the Cognitive Complexity Measure for Business Process Models"

Metrics from "Complexity metrics for business process models"

Metrics from "Cohesion and coupling metrics for workflow process design"

Metrics from "Finding a complexity measure for business process models"

Metrics from "Metrics for Process Models"

Size Metrics

Density Metrics

Partitionability Metrics

Connector Interplay Metrics

Cyclicity Metrics

Concurrency Metrics

Metrics from "What makes process models understandable?"

Metrics from "Complexity metrics for Workflow nets"

Metrics from "Proposal of square metrics for measuring business process model complexity"

Metrics from "Investigating layout complexity"

Metrics from "Guidelines on the aesthetic quality of UML class diagrams"

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally