From 11e6a4e2f3fdb6154e2fb27f64c2f2aed00bac6a Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 28 Jun 2019 14:09:31 -0700 Subject: [PATCH 01/16] Remove remaining assorted plaintext instruction docs --- docs/UserWriteUp.txt | 174 ------------------------------------------- utils/README | 90 ---------------------- 2 files changed, 264 deletions(-) delete mode 100644 docs/UserWriteUp.txt delete mode 100644 utils/README diff --git a/docs/UserWriteUp.txt b/docs/UserWriteUp.txt deleted file mode 100644 index 485e559..0000000 --- a/docs/UserWriteUp.txt +++ /dev/null @@ -1,174 +0,0 @@ -This is a work in progress and will eventually be converted to a more readable -format. - -TraceR is a replay tool targeted to simulate control flow of application on -prototype systems, i.e., if control flow of an application, which includes -expected computation tasks, communication routines, and their dependencies, is -provided to TraceR, it will mimic the flow on a hypothetical system with a given -compute and communication capability. As of now, the control flow is captured by -either emulating applications using BigSim or by linking with Score-P. CODES -is used for simulating the communication on the network. - -Expected work flow: - -1) Write an MPI application. (Avoid global variables so that the application be -run with virtualization if using BigSim). - -If using BigSim follows steps 2-4, else follow step 5. -2) Compile BigSim/Charm++ for emulation. Use any one of the following commands: - -- To use UDP as BigSim/Charm++'s communication layer: - ./build bgampi net-linux-x86_64 bigemulator --with-production --enable-tracing - ./build bgampi net-darwin-x86_64 bigemulator --with-production --enable-tracing - - or explicitly provide the compiler optimization level - ./build bgampi net-linux-x86_64 bigemulator -O2 - -- To use MPI as BigSim/Charm++'s communication layer: - ./build bgampi mpi-linux-x86_64 bigemulator --with-production --enable-tracing - -Note that this build is used to compile MPI applications so that traces can be -generated. Hence, the communication layer used by BigSim/Charm++ is not -important. During simulation, the communication will be replayed using the -network simulator from CODES. However, the computation time captured here can be -important if it is not being explicitly replaced at simulation time using -configuration options. So using appropriate compiler flags is important. - -3) Compile the MPI application from Step 1 using BigSim/Charm++ from Step 2. - -Example commands: -$CHARM_DIR/bin/ampicc -O2 simplePrg.c -o simplePrg_c -$CHARM_DIR/bin/ampiCC -O2 simplePrg.cc -o simplePrg_cxx - -4) Emulation to generate traces. When the binary generated in Step 3 is run, -BigSim/Charm++ runs the program on the allocated cores as if it would run in the -usual case. Users should provide a few additional arguments to specify the -number of MPI processes in the prototype systems. - -If using UDP as the BigSim/Charm++'s communication layer: -./charmrun +p ++nodelist ./pgm +vp +x +y +z +bglog - -If using MPI as the BigSim/Charm++'s communication layer: -mpirun -n ./pgm +vp +x +y +z +bglog - -Number of real processes is typically equal to the number cores the emulation -is being run on. - -machine file is the list of systems the emulation should be run on (similar to -machine file for MPI; refer to Charm++ website for more details). - -vp is the number of MPI ranks that are to be emulated. For simple tests, it can -be same as the number of real processes, in which case one MPI rank is run on -each real processes (as it happens when a regular program is run). When the -number of vp (virtual processes) is higher, BigSim launches user level threads -to execute multiple MPI ranks with a process. - -+x +y +z defines a 3D grid of the virtual processes. The product of these three -dimensions must match the number of vp's. These arguments do not have any -effect on the emulation, but exist due to historical reasons. - -+bglog instructs bigsim to write the logs to files. - -When this run finished, you should see many files named bgTrace* in the -directory. The total number of such files equals the number of real processes -plus one. Their names are bgTrace, bgTrace0, bgTrace1, so on. - -Create a new folder and move all bgTrace to that folder. - -5) Following instructions in README.OTF to generate OTF2 traces. - -6) Simulation. To run a simulation, 2 files are needed: a tracer config file, -and a codes config file. Optionally, mapping files can also be provided. - -Tracer config file: sample found at examples/jacobi2d-bigsim/tracer_config (BigSim) or examples/stencil4d-otf/tracer_config (OTF) Format (expected content on each line of the file): - - - - -... -``` -If is not needed, use NA for it and . -For generating simple global and job map file, use the code in utils. - -CODES config files: samples in examples/conf - -Additional documentation on format of the CODES config file can be found in the -CODES wiki at https://xgitlab.cels.anl.gov/codes/codes/wikis/home - -Brief summary follows: - -LPGROUPS, MODELNET_GRP, PARAMS are keywords and should be used as is. - -MODELNET_GRP: -repetition = number of routers that have nodes connecting to them. - -server = number of MPI processes/cores per router - -modelnet_* = number of NICs. For torus, this value has to be 1; for dragonfly, -it should be router radix divided by 4; for the fat-tree, it should be router -radix divided by 2. For the dragonfly network, modelnet_dragonfly_router should -also be specified (as 1). For express mesh, modelnet_express_mesh_router should -also be specified as 1. - -Similarly, the fat-tree config file requires specifying fattree_switch which -can be 2 or 3, depending on the number of levels in the fat-tree. Note that the -total number of cores specified in the CODES config file can be greater than -the number of MPI processes being simulated (specified in the tracer config -file). - -Other common parameters: -packet_size/chunk_size (both should have the same value): size of the packets -created by NIC for transmission on the network. Smaller the packet size, longer -the time for which simulation will run (in real time). Larger the packet size, -the less accurate the predictions are expected to be (in virtual time). Packet -sizes of 512 bytes to 4096 bytes are commonly used. - -modelnet_order = torus/dragonfly/fattree/slimfly/express_mesh - -modelnet_scheduler = -fcfs : packetize messages one by one. -round-robin : packetize message in a round robin manner. - -message_size = PDES parameter (keep constant at 512) - -router_delay = delay at each router for packet transmission (in nano seconds) - -soft_delay = delay caused by software stack such as that of MPI (in nano -seconds) - -link_bandwidth = bandwidth of each link in the system (in GB/s) - -cn_bandwidth = bandwidth of connection between NIC and router (in GB/s) - -buffer_size/vc_size = size of channels used to store transient packets at routers (in -bytes). Typical value is 64*packet_size. - -routing = how are packets being routed. Options depend on the network: -torus = static/adaptive -dragonfly = minimal/nonminimal/adaptive -fat-tree = adaptive/static - -Network specific parameters: - -Torus: n_dims - number of dimensions in the torus -dim_length - length of each dimension - -Dragonfly: num_routers - number of routers within a group. -global_bandwidth - bandwidth of the links that connect groups. - -Fat-tree: ft_type - always choose 1 -num_levels - number of levels in the fat-tree (2 or 3) -switch_radix - radix of the switch being used -switch_count - number of switches at leaf level. - -Publications that describe implementation of TraceR in detail: -Nikhil Jain, Abhinav Bhatele, Sam White, Todd Gamblin, and Laxmikant Kale. -Evaluating HPC Networks via Simulation of Parallel Workloads. SC 2016. - -Bilge Acun, Nikhil Jain, Abhinav Bhatele, Misbah Mubarak, Christopher Carothers, -Laxmikant Kale. Preliminary Evaluation of a Parallel Trace Replay Tool for HPC -Network Simulations. Workshop on Parallel and Distributed Agent-Based -Simulations at EURO-PAR 2015. - -More details can be found in Chapter 5 of this thesis: -http://charm.cs.illinois.edu/newPapers/16-02/Jain_Thesis.pdf diff --git a/utils/README b/utils/README deleted file mode 100644 index aad1306..0000000 --- a/utils/README +++ /dev/null @@ -1,90 +0,0 @@ -Ranking basics: ---------------------- -TraceR requires two sets of mapping files (with some what redundant information). -Both types files provide information about mapping of global rank to jobs and -their local rank. Global rank of a server/core is simply the logical rank that -server LPs get inside CODES. It increases linearly from servers/cores connected -to one switch to another. Due to the way default server to node mapping works -within CODES, if more than one node is connected to a switch, server/cores are -distributed in a cyclic manner. - -Example: Consider the following config file -MODELNET_GRP -{ - repetitions="8 - server="4"; - modelnet_dragonfly="4"; - modelnet_dragonfly_router="1"; -} - -Servers residing in nodes connected to the first router gets global rank 0-3, -second router gets global rank 4-7, and so on. - -Now consider this case: -MODELNET_GRP -{ - repetitions="8 - server="8"; - modelnet_dragonfly="4"; - modelnet_dragonfly_router="1"; -} - -Servers residing in nodes connected to the first router gets global rank 0-7, -second router gets global rank 8-15, and so on. However, there are 8 servers -but only 4 nodes, so each node hosts 2 servers. The servers are distributed in -a cyclic manner within a router, i.e. in router 0, server 0 is on node 0, 1 is -on node 1, 2 is on node 2, 3 is node 3, 4 is on node 0, 5 is on node 1, 6 is on -node 2, and 7 is on node 3. Similar cyclic distribution is done within every -switch. - -Map file requirements: ---------------------- -Map files are divided into two sets: global map file and individual job files. -The global file specifies how the global rank are mapped to individual jobs and -ranks within those jobs. It is a binary file structured as sets of 3 integers: - . Typical write routine look like: - -for(....) - fwrite(&global_rank, sizeof(int), 1, binout); - fwrite(&local_rank, sizeof(int), 1, binout); - fwrite(&jobid, sizeof(int), 1, binout); -endfor - -For each job, individual job map files are needed. A map file for a job is also a -binary file filled with a series of global ranks. The global ranks are ordered -by using the local ranks as the key. So, if the series of integers is loaded -into an array called local_to_global, local_to_global[i] will contain the global -rank of local rank i. - -Note for author: Eliminate individual job map files and make life easier for -users. - -Job mappers ------------------- -def_lin_mapping.C : generate linear mapping which is also the default mapping -when no mapping is specified. If nodes per router is more than 1, then this -mapping will spread the ranks in a round-robin fashion among the nodes. - -node_mapping.C : generates mapping that always places server with contiguous -global ranks on a node. That, if there 2 servers per node, ranks 0-1 are on node -0, ranks 2-3 are on node 1, and so on. - -multi_job.C : Router based various schemes for mapping. -many_job.C : Nodes based various schemes for mapping. - -Commands for execution ----------------------- -./def_lin_mapping -./node_mapping [optional ] - -Output - - in binary format -job{0,1..} files in binary format - -Example: -./def_lin_mapping global.bin 32 32 64 - -generates global.bin with 128 ranks, where first 32 are mapped to job0, next 32 -to job1, and last 64 to job2. Also generates job0, job1, job2 that maps ranks -from these jobs to global ranks. - From 7d7e59f23b9a3b97597ed1e169a31c354b90ada9 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 28 Jun 2019 14:10:13 -0700 Subject: [PATCH 02/16] Add WallTime macro for loop body timing in code example --- docs/code-examples/scorep_user_calls.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/code-examples/scorep_user_calls.c b/docs/code-examples/scorep_user_calls.c index 898b318..24c47db 100644 --- a/docs/code-examples/scorep_user_calls.c +++ b/docs/code-examples/scorep_user_calls.c @@ -6,20 +6,28 @@ int main(int argc, char **argv, char **envp) SCOREP_RECORDING_OFF(); //turn recording off for initialization/regions not of interest ... SCOREP_RECORDING_ON(); + //use verbatim to facilitate looping over the traces in simulation when simulating multiple jobs SCOREP_USER_REGION_BY_NAME_BEGIN("TRACER_Loop", SCOREP_USER_REGION_TYPE_COMMON); // at least add this BEGIN timer call - called from only one rank // you can add more calls later with region names TRACER_WallTime_ + if(myRank == 0) - SCOREP_USER_REGION_BY_NAME_BEGIN("TRACER_WallTime_MainLoop", SCOREP_USER_REGION_TYPE_COMMON); + SCOREP_USER_REGION_BY_NAME_BEGIN("TRACER_WallTime_Loop", SCOREP_USER_REGION_TYPE_COMMON); + // Application main work LOOP for ( int itscf = 0; itscf < nitscf_; itscf++ ) { + // time call to mark start of loop iteration + SCOREP_USER_REGION_BY_NAME_BEGIN("TRACER_WallTime_Loop_Iter", SCOREP_USER_REGION_TYPE_COMMON); ... + SCOREP_USER_REGION_BY_NAME_END("TRACER_WallTime_Loop_Iter"); } + // time call to mark END of work - called from only one rank if(myRank == 0) - SCOREP_USER_REGION_BY_NAME_END("TRACER_WallTime_MainLoop"); + SCOREP_USER_REGION_BY_NAME_END("TRACER_WallTime_Loop"); + // use verbatim - mark end of trace loop SCOREP_USER_REGION_BY_NAME_END("TRACER_Loop"); SCOREP_RECORDING_OFF();//turn off recording again From d2f8d88b3190e3347df92a2f284e45afa0dd7097 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 28 Jun 2019 14:11:25 -0700 Subject: [PATCH 03/16] Split user guide rst file into reuseable chunks, and add job placement file format guide --- docs/userguide.rst | 244 ++------------------------ docs/userguide/bigsim.rst | 75 ++++++++ docs/userguide/codes-config-file.rst | 110 ++++++++++++ docs/userguide/job-placement-file.rst | 105 +++++++++++ docs/userguide/score-p.rst | 62 +++++++ docs/userguide/tracer-config-file.rst | 17 ++ 6 files changed, 383 insertions(+), 230 deletions(-) create mode 100644 docs/userguide/bigsim.rst create mode 100644 docs/userguide/codes-config-file.rst create mode 100644 docs/userguide/job-placement-file.rst create mode 100644 docs/userguide/score-p.rst create mode 100644 docs/userguide/tracer-config-file.rst diff --git a/docs/userguide.rst b/docs/userguide.rst index 6baef74..33fd74d 100644 --- a/docs/userguide.rst +++ b/docs/userguide.rst @@ -4,6 +4,7 @@ User Guide Below, we provide detailed instructions for how to start doing network simulations using TraceR. +.. _userguide-quickstart: Quickstart ---------- @@ -20,242 +21,25 @@ Some useful options to use with TraceR: --max-opt-lookahead leash on optimistic execution in nanoseconds (1 microsecond is a good value) --timer-frequency frequency with which PE0 should print current virtual time -Creating a TraceR configuration file ------------------------------------- +Setting up a Simulation +----------------------- -This is the format for the TraceR config file:: +.. _userguide-tracer-config-file: +.. include:: userguide/tracer-config-file.rst - - - - - ... - +See :ref:`userguide-job-placement-file` below for how to generate global or per-job map files. +.. _userguide-codes-config-file: +.. include:: userguide/codes-config-file.rst -If you do not intend to create global or per-job map files, you can use ``NA`` -instead of them. - -Sample TraceR config files can be found in examples/jacobi2d-bigsim/tracer_config (BigSim) or examples/stencil4d-otf/tracer_config (OTF) - -See `Creating the job placement file`_ below for how to generate global or per-job map files. - -Creating the network (CODES) configuration file ------------------------------------------------ -Sample network configuration files can be found in examples/conf - -Additional documentation on the format of the CODES config file can be found in the -CODES wiki at https://xgitlab.cels.anl.gov/codes/codes/wikis/home - -A brief summary of the format follows. - -LPGROUPS, MODELNET_GRP, PARAMS are keywords and should be used as is. - -MODELNET_GRP:: - - repetition = number of routers that have nodes connecting to them. - - server = number of MPI processes/cores per router - - modelnet_* = number of NICs. For torus, this value has to be 1; for dragonfly, - it should be router radix divided by 4; for the fat-tree, it should be router - radix divided by 2. For the dragonfly network, modelnet_dragonfly_router should - also be specified (as 1). For express mesh, modelnet_express_mesh_router should - also be specified as 1. - - Similarly, the fat-tree config file requires specifying fattree_switch which - can be 2 or 3, depending on the number of levels in the fat-tree. Note that the - total number of cores specified in the CODES config file can be greater than - the number of MPI processes being simulated (specified in the tracer config - file). - -Other common parameters:: - - packet_size/chunk_size (both should have the same value) = size of the packets - created by NIC for transmission on the network. Smaller the packet size, longer - the time for which simulation will run (in real time). Larger the packet size, - the less accurate the predictions are expected to be (in virtual time). Packet - sizes of 512 bytes to 4096 bytes are commonly used. - - modelnet_order = torus/dragonfly/fattree/slimfly/express_mesh - - modelnet_scheduler = - fcfs: packetize messages one by one. - round-robin: packetize message in a round robin manner. - - message_size = PDES parameter (keep constant at 512) - - router_delay = delay at each router for packet transmission (in nanoseconds) - - soft_delay = delay caused by software stack such as that of MPI (in nanoseconds) - - link_bandwidth = bandwidth of each link in the system (in GB/s) - - cn_bandwidth = bandwidth of connection between NIC and router (in GB/s) - - buffer_size/vc_size = size of channels used to store transient packets at routers (in - bytes). Typical value is 64*packet_size. - - routing = how are packets being routed. Options depend on the network. - torus: static/adaptive - dragonfly: minimal/nonminimal/adaptive - fat-tree: adaptive/static - -Network specific parameters:: - - Torus: - n_dims = number of dimensions in the torus - dim_length = length of each dimension - - Dragonfly: - num_routers = number of routers within a group. - global_bandwidth = bandwidth of the links that connect groups. - - Fat-tree: - ft_type = always choose 1 - num_levels = number of levels in the fat-tree (2 or 3) - switch_radix = radix of the switch being used - switch_count = number of switches at leaf level. - -Creating the job placement file -------------------------------- - -See the README in utils for instructions on using the tools to generate the global and job mapping files. +.. _userguide-job-placement-file: +.. include:: userguide/job-placement-file.rst Generating Traces ----------------- -Score-P -^^^^^^^ - -Installation of Score-P -""""""""""""""""""""""" - -1. Download from http://www.vi-hps.org/projects/score-p/ -#. tar -xvzf scorep-3.0.tar.gz -#. cd scorep-3.0 -#. CC=mpicc CFLAGS="-O2" CXX=mpicxx CXXFLAGS="-O2" FC=mpif77 ./configure --without-gui --prefix= -#. make -#. make install - -Generating OTF2 traces with an MPI program using Score-P -"""""""""""""""""""""""""""""""""""""""""""""""""""""""" - -Detailed instructions are available at https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf. - -1. Add $SCOREP_INSTALL/bin to your PATH for convenience. Example:: - - export SCOREP_INSTALL=$HOME/workspace/scoreP/scorep-3.0/install - export PATH=$SCOREP_INSTALL/bin:$PATH - -2. Add the following compile time flags to the application:: - - -I$SCOREP_INSTALL/include -I$SCOREP_INSTALL/include/scorep -DSCOREP_USER_ENABLE - -3. Add #include to all files where you plan to add any of the following Score-P calls (optional step):: - - SCOREP_RECORDING_OFF(); - stop recording - SCOREP_RECORDING_ON(); - start recording - - Marking special regions: SCOREP_USER_REGION_BY_NAME_BEGIN(regionname, SCOREP_USER_REGION_TYPE_COMMON) and SCOREP_USER_REGION_BY_NAME_END(regionname). - - Region names beginning with TRACER_WallTime\_ are special: using TRACER_WallTime_ prints current time during simulation with tag . - - An example using these features is given below: - - .. literalinclude:: code-examples/scorep_user_calls.c - :language: c - -4. For the link step, prefix the linker line with the following:: - - LD = scorep --user --nocompiler --noopenmp --nopomp --nocuda --noopenacc --noopencl --nomemory - -5. For running, set:: - - export SCOREP_ENABLE_TRACING=1 - export SCOREP_ENABLE_PROFILING=0 - export SCOREP_REDUCE_PROBE_TEST=1 - export SCOREP_MPI_ENABLE_GROUPS=ENV,P2P,COLL,XNONBLOCK - - If Score-P prints a warning about flushing traces during the run, you may avoid them using:: - - export SCOREP_TOTAL_MEMORY=256M - export SCOREP_EXPERIMENT_DIRECTORY=/p/lscratchd//... - -6. Run the binary and traces should be generated in a folder named scorep-\*. - -BigSim -^^^^^^ - -Installation of BigSim -"""""""""""""""""""""" - -Compile BigSim/Charm++ for emulation (see http://charm.cs.illinois.edu/manuals/html/bigsim/manual-1p.html -for more detail). Use any one of the following commands: - -- To use UDP as BigSim/Charm++'s communication layer:: - - ./build bgampi net-linux-x86_64 bigemulator --with-production --enable-tracing - ./build bgampi net-darwin-x86_64 bigemulator --with-production --enable-tracing - - Or explicitly provide the compiler optimization level:: - - ./build bgampi net-linux-x86_64 bigemulator -O2 - -- To use MPI as BigSim/Charm++'s communication layer:: - - ./build bgampi mpi-linux-x86_64 bigemulator --with-production --enable-tracing - -.. note:: - This build is used to compile MPI applications so that traces can be - generated. Hence, the communication layer used by BigSim/Charm++ is not - important. During simulation, the communication will be replayed using the - network simulator from CODES. However, the computation time captured here can be - important if it is not being explicitly replaced at simulation time using - configuration options. So using appropriate compiler flags is important. - -Generating AMPI traces with an MPI program using BigSim -""""""""""""""""""""""""""""""""""""""""""""""""""""""" - -1. Compile your MPI application using BigSim/Charm++. - - Example commands:: - - $CHARM_DIR/bin/ampicc -O2 simplePrg.c -o simplePrg_c - $CHARM_DIR/bin/ampiCC -O2 simplePrg.cc -o simplePrg_cxx - -2. Emulation to generate traces. When the binary generated is run, - BigSim/Charm++ runs the program on the allocated cores as if it were - running as usual. Users should provide a few additional arguments to - specify the number of MPI processes in the prototype systems. - - If using UDP as the BigSim/Charm++'s communication layer:: - - ./charmrun +p ++nodelist ./pgm +vp +x +y +z +bglog - - If using MPI as the BigSim/Charm++'s communication layer:: - - mpirun -n ./pgm +vp +x +y +z +bglog - - Number of real processes is typically equal to the number cores the emulation - is being run on. - - *machine file* is the list of systems the emulation should be run on (similar to - machine file for MPI; refer to Charm++ website for more details). - - *vp* is the number of MPI ranks that are to be emulated. For simple tests, it can - be the same as the number of real processes, in which case one MPI rank is run on - each real process (as it happens when a regular program is run). When the - number of vp (virtual processes) is higher, BigSim launches user level threads - to execute multiple MPI ranks within a process. - - *+x +y +z* defines a 3D grid of the virtual processes. The product of these three - dimensions must match the number of vp's. These arguments do not have any - effect on the emulation, but exist due to historical reasons. - - *+bglog* instructs bigsim to write the logs to files. +.. _userguide-score-p: +.. include:: userguide/score-p.rst -3. When this run is finished, you should see many files named *bgTrace\** in the - directory. The total number of such files equals the number of real processes - plus one. Their names are bgTrace, bgTrace0, bgTrace1, and so on. - Create a new folder and move all *bgTrace* files to that folder. +.. _userguide-bigsim: +.. include:: userguide/bigsim.rst \ No newline at end of file diff --git a/docs/userguide/bigsim.rst b/docs/userguide/bigsim.rst new file mode 100644 index 0000000..64100c4 --- /dev/null +++ b/docs/userguide/bigsim.rst @@ -0,0 +1,75 @@ +BigSim +^^^^^^ + +Installation of BigSim +"""""""""""""""""""""" + +Compile BigSim/Charm++ for emulation (see http://charm.cs.illinois.edu/manuals/html/bigsim/manual-1p.html +for more detail). Use any one of the following commands: + +- To use UDP as BigSim/Charm++'s communication layer:: + + ./build bgampi net-linux-x86_64 bigemulator --with-production --enable-tracing + ./build bgampi net-darwin-x86_64 bigemulator --with-production --enable-tracing + + Or explicitly provide the compiler optimization level:: + + ./build bgampi net-linux-x86_64 bigemulator -O2 + +- To use MPI as BigSim/Charm++'s communication layer:: + + ./build bgampi mpi-linux-x86_64 bigemulator --with-production --enable-tracing + +.. note:: + This build is used to compile MPI applications so that traces can be + generated. Hence, the communication layer used by BigSim/Charm++ is not + important. During simulation, the communication will be replayed using the + network simulator from CODES. However, the computation time captured here can be + important if it is not being explicitly replaced at simulation time using + configuration options. So using appropriate compiler flags is important. + +Generating AMPI traces with an MPI program using BigSim +""""""""""""""""""""""""""""""""""""""""""""""""""""""" + +1. Compile your MPI application using BigSim/Charm++. + + Example commands:: + + $CHARM_DIR/bin/ampicc -O2 simplePrg.c -o simplePrg_c + $CHARM_DIR/bin/ampiCC -O2 simplePrg.cc -o simplePrg_cxx + +2. Emulation to generate traces. When the binary generated is run, + BigSim/Charm++ runs the program on the allocated cores as if it were + running as usual. Users should provide a few additional arguments to + specify the number of MPI processes in the prototype systems. + + If using UDP as the BigSim/Charm++'s communication layer:: + + ./charmrun +p ++nodelist ./pgm +vp +x +y +z +bglog + + If using MPI as the BigSim/Charm++'s communication layer:: + + mpirun -n ./pgm +vp +x +y +z +bglog + + Number of real processes is typically equal to the number cores the emulation + is being run on. + + *machine file* is the list of systems the emulation should be run on (similar to + machine file for MPI; refer to Charm++ website for more details). + + *vp* is the number of MPI ranks that are to be emulated. For simple tests, it can + be the same as the number of real processes, in which case one MPI rank is run on + each real process (as it happens when a regular program is run). When the + number of vp (virtual processes) is higher, BigSim launches user level threads + to execute multiple MPI ranks within a process. + + *+x +y +z* defines a 3D grid of the virtual processes. The product of these three + dimensions must match the number of vp's. These arguments do not have any + effect on the emulation, but exist due to historical reasons. + + *+bglog* instructs bigsim to write the logs to files. + +3. When this run is finished, you should see many files named *bgTrace\** in the + directory. The total number of such files equals the number of real processes + plus one. Their names are bgTrace, bgTrace0, bgTrace1, and so on. + Create a new folder and move all *bgTrace* files to that folder. \ No newline at end of file diff --git a/docs/userguide/codes-config-file.rst b/docs/userguide/codes-config-file.rst new file mode 100644 index 0000000..abb0474 --- /dev/null +++ b/docs/userguide/codes-config-file.rst @@ -0,0 +1,110 @@ +Creating the network (CODES) configuration file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Sample network configuration files can be found in examples/conf + +Additional documentation on the format of the CODES config file can be found in the +CODES wiki at https://xgitlab.cels.anl.gov/codes/codes/wikis/home + +A brief summary of the format follows. + +LPGROUPS, MODELNET_GRP, PARAMS are keywords and should be used as is. + +MODELNET_GRP +"""""""""""" +repetition + number of routers that have nodes connecting to them. + +server + number of MPI processes/cores per router + +modelnet_* + number of NICs. For torus, this value has to be 1; for dragonfly, + it should be router radix divided by 4; for the fat-tree, it should be router + radix divided by 2. For the dragonfly network, modelnet_dragonfly_router should + also be specified (as 1). For express mesh, modelnet_express_mesh_router should + also be specified as 1. + + Similarly, the fat-tree config file requires specifying fattree_switch which + can be 2 or 3, depending on the number of levels in the fat-tree. Note that the + total number of cores specified in the CODES config file can be greater than + the number of MPI processes being simulated (specified in the tracer config + file). + +Common parameters +""""""""""""""""" + +packet_size/chunk_size (both should have the same value) + size of the packets created by NIC for transmission on the network. Smaller the + packet size, longer the time for which simulation will run (in real time). Larger + the packet size, the less accurate the predictions are expected to be (in virtual + time). Packet sizes of 512 bytes to 4096 bytes are commonly used. + +modelnet_order + torus/dragonfly/fattree/slimfly/express_mesh + +modelnet_scheduler + fcfs: packetize messages one by one. + + round-robin: packetize message in a round robin manner. + +message_size + PDES parameter (keep constant at 512) + +router_delay + delay at each router for packet transmission (in nanoseconds) + +soft_delay + delay caused by software stack such as that of MPI (in nanoseconds) + +link_bandwidth + bandwidth of each link in the system (in GB/s) + +cn_bandwidth + bandwidth of connection between NIC and router (in GB/s) + +buffer_size/vc_size + size of channels used to store transient packets at routers (in + bytes). Typical value is 64*packet_size. + +routing + how are packets being routed. Options depend on the network. + + torus: static/adaptive + + dragonfly: minimal/nonminimal/adaptive + + fat-tree: adaptive/static + +Network specific parameters +""""""""""""""""""""""""""" + +Torus: + +n_dims + number of dimensions in the torus + +dim_length + length of each dimension + +Dragonfly: + +num_routers + number of routers within a group. + +global_bandwidth + bandwidth of the links that connect groups. + +Fat-tree: + +ft_type + always choose 1 + +num_levels + number of levels in the fat-tree (2 or 3) + +switch_radix + radix of the switch being used + +switch_count + number of switches at leaf level. \ No newline at end of file diff --git a/docs/userguide/job-placement-file.rst b/docs/userguide/job-placement-file.rst new file mode 100644 index 0000000..5950586 --- /dev/null +++ b/docs/userguide/job-placement-file.rst @@ -0,0 +1,105 @@ +Creating the job placement file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Ranking basics +"""""""""""""" + +TraceR requires two sets of mapping files (with some what redundant information). +Both types of files provide information about mapping of global rank to jobs and +their local rank. Global rank of a server/core is the logical rank that server LPs +get inside CODES. It increases linearly for servers/cores connected from one switch +to another. Due to the way default server to node mapping works within CODES, if +more than one node is connected to a switch, servers/cores are distributed in a +cyclic manner. + +Example config file:: + + MODELNET_GRP + { + repetitions="8"; + server="4"; + modelnet_dragonfly="4"; + modelnet_dragonfly_router="1"; + } + +Servers residing in nodes connected to the first router get global ranks 0-3, +nodes connected to the second router get global ranks 4-7, and so on. + +Now, consider another case:: + + MODELNET_GRP + { + repetitions="8"; + server="8"; + modelnet_dragonfly="4"; + modelnet_dragonfly_router="1"; + } + +Servers residing in nodes connected to the first router get global ranks 0-7, +nodes connected to the second router get global ranks 8-15, and so on. However, +there are 8 servers but only 4 nodes, so each node hosts 2 servers. The servers +are distributed in a cyclic manner within a router, i.e. in router 0, server 0 +is on node 0, 1 is on node 1, 2 is on node 2, 3 is on node 3, 4 is on node 0, 5 +is on node 1, 6 is on node 2, and 7 is on node 3. Similar cyclic distribution is +done within every switch. + +Map file requirements +""""""""""""""""""""" + +Map files are divided into two sets: global map files and individual job files. +The global file specifies how the global ranks are mapped to individual jobs and +ranks within those jobs. It is a binary file structured as sets of 3 integers: + . A typical write routine looks like this: + +.. code:: + + for(...) + fwrite(&global_rank, sizeof(int), 1, binout); + fwrite(&local_rank, sizeof(int), 1, binout); + fwrite(&jobid, sizeof(int), 1, binout); + endfor + +For each job, individual job map files are needed. A map file for a job is also a +binary file filled with a series of global ranks. The global ranks are ordered by +using the local ranks as the key. So, if the series of integers is loaded into an +array called local_to_global, local_to_global[i] will contain the global rank of +local rank i. + +Job mappers +""""""""""" + +In the utils subfolder of the TraceR repository, there are several job mappers +written in C that can be used to generate job map files with various layouts. +Eventually these will likely be rewritten as a Python script. A brief summary +of the generators provided follows. + +def_lin_mapping.C + Generates a linear mapping which is also the default mapping + when no mapping is specified. If nodes per router is more than 1, then this + mapping will spread the ranks in a round-robin fashion among the nodes. + +node_mapping.C + Generates a mapping that always places servers with contiguous + global ranks on a node. That is, if there are 2 servers per node, ranks 0-1 are + on node 0, ranks 2-3 are on node 1, and so on. + +multi_job.C + Router based various schemes for mapping. + +many_job.C + Nodes based various schemes for mapping. + +Commands for execution +"""""""""""""""""""""" +./def_lin_mapping +./node_mapping [optional ] + +The output from these commands will be a global map file, and job{0,1..} files in binary format. + +Example: + + ./def_lin_mapping global.bin 32 32 64 + +The above command generates global.bin with 128 ranks, where the first 32 are mapped to job0, +the next 32 to job1, and last 64 to job2. It also generates job0, job1, and job2 that maps +ranks from these jobs to global ranks. \ No newline at end of file diff --git a/docs/userguide/score-p.rst b/docs/userguide/score-p.rst new file mode 100644 index 0000000..e37f48f --- /dev/null +++ b/docs/userguide/score-p.rst @@ -0,0 +1,62 @@ +Score-P +^^^^^^^ + +Installation of Score-P +""""""""""""""""""""""" + +1. Download from http://www.vi-hps.org/projects/score-p/ +#. tar -xvzf scorep-5.0.tar.gz +#. cd scorep-5.0 +#. CC=mpicc CFLAGS="-O2" CXX=mpicxx CXXFLAGS="-O2" FC=mpif77 ./configure --without-gui --prefix= +#. make +#. make install + +Generating OTF2 traces with an MPI program using Score-P +"""""""""""""""""""""""""""""""""""""""""""""""""""""""" + +Detailed instructions are available at https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf. + +1. Add $SCOREP_INSTALL/bin to your PATH for convenience. Example:: + + export SCOREP_INSTALL=$HOME/workspace/scoreP/scorep-5.0/install + export PATH=$SCOREP_INSTALL/bin:$PATH + +2. Add the following compile time flags to the application:: + + -I$SCOREP_INSTALL/include -I$SCOREP_INSTALL/include/scorep -DSCOREP_USER_ENABLE + +3. Add #include to all files where you plan to add any of the following Score-P calls (optional step):: + + SCOREP_RECORDING_OFF(); - stop recording + SCOREP_RECORDING_ON(); - start recording + + Marking special regions: SCOREP_USER_REGION_BY_NAME_BEGIN(regionname, SCOREP_USER_REGION_TYPE_COMMON) and SCOREP_USER_REGION_BY_NAME_END(regionname). + + Region names beginning with TRACER_WallTime\_ are special: using TRACER_WallTime_ prints current time during simulation with tag . + + An example using these features is given below: + + .. literalinclude:: code-examples/scorep_user_calls.c + :language: c + +4. For the link step, prefix the linker line with the following:: + + LD = scorep --user --nocompiler --noopenmp --nopomp --nocuda --noopenacc --noopencl --nomemory + +5. For running, set:: + + export SCOREP_ENABLE_TRACING=1 + export SCOREP_ENABLE_PROFILING=0 + export SCOREP_MPI_ENABLE_GROUPS=ENV,P2P,COLL,XNONBLOCK + + If Score-P prints a warning about flushing traces during the run, you may avoid them using:: + + export SCOREP_TOTAL_MEMORY=256M + export SCOREP_EXPERIMENT_DIRECTORY=/p/lscratchd//... + + .. note:: + For larger simulations, performance can get slow. There is a :download:`patch for Score-P 5.0 ` that + adds an option to reduce the number of MPI Probes. After applying the patch, it can be enabled like the other Score-P + options with ``export SCOREP_REDUCE_PROBE_TEST=1``. + +6. Run the binary and traces should be generated in a folder named scorep-\*. \ No newline at end of file diff --git a/docs/userguide/tracer-config-file.rst b/docs/userguide/tracer-config-file.rst new file mode 100644 index 0000000..c67c9f3 --- /dev/null +++ b/docs/userguide/tracer-config-file.rst @@ -0,0 +1,17 @@ +Creating a TraceR configuration file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This is the format for the TraceR config file:: + + + + + + ... + + + +If you do not intend to create global or per-job map files, you can use ``NA`` +instead of them. + +Sample TraceR config files can be found in examples/jacobi2d-bigsim/tracer_config (BigSim) or examples/stencil4d-otf/tracer_config (OTF) \ No newline at end of file From 14f8c9351e657e59945b5ba004e52d2bded86aa2 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 28 Jun 2019 14:12:10 -0700 Subject: [PATCH 04/16] Add brief expected workflow derived from UserWriteUp.txt --- docs/index.rst | 1 + docs/workflow.rst | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) create mode 100644 docs/workflow.rst diff --git a/docs/index.rst b/docs/index.rst index 573d209..042d27c 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -17,6 +17,7 @@ Computing applications on interconnection networks. install userguide + workflow tutorial autogen/doxygen diff --git a/docs/workflow.rst b/docs/workflow.rst new file mode 100644 index 0000000..0e8c478 --- /dev/null +++ b/docs/workflow.rst @@ -0,0 +1,38 @@ +Expected Workflow +================= + +This guide will walk you through the expected workflow for using TraceR. +It will direct you to resources on generating BigSim and OTF2 traces, the +format of a TraceR configuration file, and a basic command for running a +simulation. + +TraceR is a replay tool targeted to simulate control flow of application on +prototype systems, i.e., if control flow of an application, which includes +expected computation tasks, communication routines, and their dependencies, is +provided to TraceR, it will mimic the flow on a hypothetical system with a given +compute and communication capability. As of now, the control flow is captured by +either emulating applications using BigSim or by linking with Score-P. CODES +is used for simulating the communication on the network. + +1. Write an MPI application. (Avoid global variables so that the application be + run with virtualization if using BigSim). Included in the TraceR repository are + two examples: jacobi2d-bigsim and stencil4d-otf. The jacobi2d-bigsim example + shows how a program would be compiled to generate BigSim traces, and the + stencil4d-otf example shows how to compile a program for generating OTF2 traces. + + .. note:: + If you're using BigSim, avoid global variables in your MPI application so that it can be run with virtualization. + +2. Generate traces. For instructions on generating OTF2 traces, see the user guide + section on using :ref:`userguide-score-p`, or for using BigSim traces see the section in + the user guide about :ref:`userguide-bigsim`. + +3. After generating traces, 2 files are needed: a tracer config file, and a codes config file. + Optionally, mapping files can also be provided. See :ref:`userguide-tracer-config-file`, :ref:`userguide-codes-config-file`, + and :ref:`userguide-job-placement-file` in the user guide for instructions on creating the files. + +4. Run the simulation using ``mpirun``. For details on options available, see the + :ref:`quickstart section of the user guide `. This command will + run a simulation in optimistic mode:: + + mpirun -np

../traceR --sync=3 -- \ No newline at end of file From c2d5e16986f84678a193ba5ee39de3945a82c6f0 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 28 Jun 2019 14:22:30 -0700 Subject: [PATCH 05/16] Remove : after Contents heading in the navbar --- docs/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.rst b/docs/index.rst index 042d27c..90b0d12 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -13,7 +13,7 @@ Computing applications on interconnection networks. .. toctree:: :maxdepth: 2 - :caption: Contents: + :caption: Contents install userguide From 24b31d4689fb778cad407fb4cdfd68057c7536c2 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 28 Jun 2019 14:31:04 -0700 Subject: [PATCH 06/16] Formatting clean-up --- docs/userguide/bigsim.rst | 2 +- docs/userguide/codes-config-file.rst | 40 +++++++++++++-------------- docs/userguide/job-placement-file.rst | 5 ++-- 3 files changed, 24 insertions(+), 23 deletions(-) diff --git a/docs/userguide/bigsim.rst b/docs/userguide/bigsim.rst index 64100c4..70af1e3 100644 --- a/docs/userguide/bigsim.rst +++ b/docs/userguide/bigsim.rst @@ -4,7 +4,7 @@ BigSim Installation of BigSim """""""""""""""""""""" -Compile BigSim/Charm++ for emulation (see http://charm.cs.illinois.edu/manuals/html/bigsim/manual-1p.html +Compile BigSim/Charm++ for emulation (see the `BigSim manual `_ for more detail). Use any one of the following commands: - To use UDP as BigSim/Charm++'s communication layer:: diff --git a/docs/userguide/codes-config-file.rst b/docs/userguide/codes-config-file.rst index abb0474..2447d8e 100644 --- a/docs/userguide/codes-config-file.rst +++ b/docs/userguide/codes-config-file.rst @@ -31,8 +31,8 @@ modelnet_* the number of MPI processes being simulated (specified in the tracer config file). -Common parameters -""""""""""""""""" +Common parameters (PARAMS) +"""""""""""""""""""""""""" packet_size/chunk_size (both should have the same value) size of the packets created by NIC for transmission on the network. Smaller the @@ -76,35 +76,35 @@ routing fat-tree: adaptive/static -Network specific parameters -""""""""""""""""""""""""""" +Network specific parameters (PARAMS) +"""""""""""""""""""""""""""""""""""" Torus: -n_dims - number of dimensions in the torus + n_dims + number of dimensions in the torus -dim_length - length of each dimension + dim_length + length of each dimension Dragonfly: -num_routers - number of routers within a group. + num_routers + number of routers within a group. -global_bandwidth - bandwidth of the links that connect groups. + global_bandwidth + bandwidth of the links that connect groups. Fat-tree: -ft_type - always choose 1 + ft_type + always choose 1 -num_levels - number of levels in the fat-tree (2 or 3) + num_levels + number of levels in the fat-tree (2 or 3) -switch_radix - radix of the switch being used + switch_radix + radix of the switch being used -switch_count - number of switches at leaf level. \ No newline at end of file + switch_count + number of switches at leaf level. \ No newline at end of file diff --git a/docs/userguide/job-placement-file.rst b/docs/userguide/job-placement-file.rst index 5950586..debcbac 100644 --- a/docs/userguide/job-placement-file.rst +++ b/docs/userguide/job-placement-file.rst @@ -12,7 +12,7 @@ to another. Due to the way default server to node mapping works within CODES, if more than one node is connected to a switch, servers/cores are distributed in a cyclic manner. -Example config file:: +Consider this example config file:: MODELNET_GRP { @@ -92,11 +92,12 @@ many_job.C Commands for execution """""""""""""""""""""" ./def_lin_mapping + ./node_mapping [optional ] The output from these commands will be a global map file, and job{0,1..} files in binary format. -Example: +Example:: ./def_lin_mapping global.bin 32 32 64 From d9f616b4428955480a542e5ae1b7e7b1ea5fb079 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Tue, 9 Jul 2019 13:59:38 -0700 Subject: [PATCH 07/16] Update CODES repository URL for Travis build --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 64857b9..9ce063a 100644 --- a/.travis.yml +++ b/.travis.yml @@ -20,7 +20,7 @@ install: popd # Install CODES - | - git clone https://xgitlab.cels.anl.gov/codes/codes.git ${TRAVIS_BUILD_DIR}/ci-build-deps/CODES + git clone https://github.com/codes-org/codes.git ${TRAVIS_BUILD_DIR}/ci-build-deps/CODES pushd ${TRAVIS_BUILD_DIR}/ci-build-deps/CODES ./prepare.sh mkdir build From 21727e286fb455c6af8a7902985e5c1ea59bd213 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Tue, 9 Jul 2019 14:01:40 -0700 Subject: [PATCH 08/16] Add tutorial section to docs --- docs/install.rst | 4 +- docs/tutorial.rst | 33 +++ docs/tutorial/hoti25-slide-preview.png | Bin 0 -> 22593 bytes docs/tutorial/network_models.rst | 393 +++++++++++++++++++++++++ docs/tutorial/simulation_basics.rst | 124 ++++++++ docs/{ => tutorial}/workflow.rst | 2 + 6 files changed, 554 insertions(+), 2 deletions(-) create mode 100644 docs/tutorial/hoti25-slide-preview.png create mode 100644 docs/tutorial/network_models.rst create mode 100644 docs/tutorial/simulation_basics.rst rename docs/{ => tutorial}/workflow.rst (98%) diff --git a/docs/install.rst b/docs/install.rst index 098dd62..d314891 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -6,7 +6,7 @@ TraceR can be downloaded from `GitHub `_. Dependencies ------------ -TraceR depends on `CODES `_ and `ROSS `_. +TraceR depends on `CODES `_ and `ROSS `_. Build ----- @@ -42,5 +42,5 @@ TraceR supports two different trace formats as input. For each format, you need 2. AMPI-based BigSim format: To use BigSim traces as input to TraceR, you need to download and build `Charm++ `_. The instructions to build Charm++ are in the `Charm++ manual -`_. You should use +`_. You should use the "charm++" target and pass "bigemulator" as a build option. diff --git a/docs/tutorial.rst b/docs/tutorial.rst index b23b9e5..cc320c1 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -1,2 +1,35 @@ +.. _tutorial: + Tutorial ======== + +.. rubric:: Slides + +.. figure:: tutorial/hoti25-slide-preview.png + :target: http://www.hoti.org/tutorials/HOTI25_Tutorial_2c.pdf + :height: 72px + :align: left + :alt: Slide preview + +`Download Slides `_. + +**Full citation:** Nikhil Jain and Misbah Mubarak. +`CODES-TRACER Tutorial: Enabling HPC Design Space +Exploration via Discrete-Event Simulation +`_. +Tutorial presented at 25th Annual Symposium on High Performance +Interconnects (HOTI). Aug 28, 2017, Santa Clara, CA, USA. + +.. rubric:: Guides + +These guides will give some of the basics needed to use TraceR. +1. :ref:`tutorial/network_models` +2. :ref:`tutorial/simulation_basics` +3. :ref:`tutorial/workflow` + +Full contents: + +.. toctree:: + tutorial/network_models + tutorial/simulation_basics + tutorial/workflow \ No newline at end of file diff --git a/docs/tutorial/hoti25-slide-preview.png b/docs/tutorial/hoti25-slide-preview.png new file mode 100644 index 0000000000000000000000000000000000000000..88b8bc145920c7b7bc24bd600de402fadb7fcb62 GIT binary patch literal 22593 zcmbq)Wl$YY^d(Mk3+|8r!QCY|!QI_mg1fs1cL*NbT_5i5@Nl<>+k@p-_5ZR}`(d~C z!_4%Z>YllEr>DEm?Q628EFo{vnL5(7E}cC5Qlj1T0`=TpZ<)glWui$>yiOSAt8j5g2OnIzAzn$xD|o7njVX>k zy7l$L&Gm@q6Q3#s`uT~LC`Ne!=@1O0R*6e?BU*!UG28~bo|broMBPjuz{Ls;leI>xl@50%YinzCBBGda#tvkz4M*s; z76*lAP|{ObMMZGoopG+8>gma4kBt$0%bj430d~87BO)fMyE;Y?3y1B?xR;g^ z3{9Tk_h{=NuXg>~59RwqGp=uN65GCd>I))@SVA7x7~MZdeim$fd;9C_M+q-4oJ?4F zGIoBvU%$%B3VxPTmU8p*=8NBc`FeYqQxGBPr3JC>R2%JSK@#a#R<3F%kVPx6YqqrZMNkBv#+ z(tkT5t=+7pS$Y`$E^1rhjaGZ%jN0;j!^;4$?3FNTS11u%9~u&}0*8!ecs5@e9~)!e zaaP@7D;kc<0o>i)we|D!Yu0KoUoCqkH2#A{F|-;673bk79JIAmEmTl%(D zRD!8r;RoTtVG*#>3M842SJpSH_*dPOqOrv6Q&m-4G^hJKh~13X zJ?~(q7RwOdgQ%mcZnm2{LQ`*cn6fHwj=etjTUvHw?36%9$*MRqO-I#!e2j?=Bt!Ra zvLn0yH(_YvFQH&3*l{S!pt7o}!R9I*(BM&?$$6_29-A9|H~5fHRZX3Om31zChp#Ml z)Z_a87Lcpgp{T4JM4jhHkmvVS7nzrk5TJnn5g#uYwW3~C>$E2^k?TxeDo$^IR9MOBCh*mc1u(2G?`*^3_Ey>Hi1|OEZb%?=opfFIBR!orOvkhd zdEHsc`hdheygb;G@yO=)%1d*hV3D<1dTM&!3958jIDy6zTk&V_%y>2YzpFgE_f{1^ zTLuonTpT_oH;gr915;oGyp1Du-H*E}M|0m&e^y*lc~6|F1MF6CVhhGbH4h`f3d&~L zLFq>=TvwReFtq{GU^vr8i>urOR|HF7b3?)LyZGyQwU66+d ze^pBhkJA|Dfs{L-TCcUCdbc5^Qz%i}egaZ@zGFtpiZdcD;G>=|cqH_l*}IeP0W* zn!lv%M5^?kg+ObE{EgpxwYk>R3toY&9eMy;djA zj-cf-70iwR#pue^p?5?N-%S*);!wRUPj;**HDax~1HsbexR{X@$E2s^KIrkOoFW9? z%jn{rJ|rO$X>**ogyeYFS2jo1(ywP|restbxM038 z{4^StDu|BwqyAw4ib~o5A^EpuQyrhw?+}?3m`Q(X<0ho}2z0Cvj=LqNbH5-Vv0b4O z^M*`J$hLNLP}1{nFlV-;;nI$cv;1D5NeUzEL7)srg%dI`0MwHD-6eYZ z^Yh_WaJ}w!VKX=#;BFm*!3Z&NodjIsz(A|XungvUs=4|3*}vjQjm2aL`M??w0y#A` zd|O*vx{w3OW$PZQN#uxe{<^SvHM^1aFgqZQSW;0*OWeN$=NYybZ>fW_2WBstbUBk5 zTWxX0lQ~l4=N_?H`uhc|tj94!<2hb%dhO3jolXSLn4bF)gvu4q5OqZo30M2r-C%F# zP780BxL7e^Ve+w6IM~qQ)|)vb=2NtL=!}%&gTrOi=lyzqL>W4kZU^=fK_2AktvRH- zFB9&g`DT1`Gn`{+?lCsuHsOQ2#p^G4lYST1>M;Pfh`T*s`j5 zlzvwslQ!E$x;Z4?ltU_nr7;a9G)HtqufXG0bVYR=oT%ktbiu%k{M6L-x57MLS46#b z7vX9Bt{C+#ad*Sp5;;5pb7Tq{^pL~JtSn&JvE}WVj7^{)FMiAkH%~!~eE@T#8tDeh z14bii7bXK#)a(`zc2cy9`RS0n^|9Ft?%%(EtjP0~T5PB0hKr|*WtG(uS)6tgKQrkQ z6$HUwe^4uAV)&zYlUqxMYyPL_9uz17=G0S}TNpFj% z*H!><9h+F^iK2HM7XR67q(Ho&ZW z?=shGHZ?nmk|cqI<2gwNVqE!%(Mwkn6=jMpB?|hoRYWAa@k#hRn(a=c9Qt429-~AN z^XM$!;hsJEfy>cB{TK+P^)WyqjAKce0yyy)9hnMAmGBo=cc+pH)!^TVL>7qoG}_1+ zo}oiW_V^?r`4A55`m5XSv#R{=D(;Pc9Y~M&q9Rq?jRqTYxEuG#2RP7ve!%;Qsg`jK z-9-&Gnn})9_~KPnYO{JXbZ+P>PO*MXR-HX}XX}1F-L&#}39G&?LIfryM9&HoUZ3;> zq^23fhf@P-7;DV6PAv#cU1QGiBUAWPe$Z@e_w#bAkl8$l>>&s9hEe_~{DNk%g>j*U z^SNH;M$v*Y@V`&N?@ju?EoRKbNGoEh`jzOH7$y;GAV(LKJJ~n(ub+!HOJYJKc)w-# z9NGIX*1x$lGztr_u##p<)Q~fPiP~Lo&1=?(sr-#`XJ2{($UMAX(|>eR=lNg?y_ zNnaM#^u*#wt~cB5^t;3j4Hs(U39i&S6=E<(y5Rf0fh5`F$!)iMzgA_2Cz2zk`w{=P z`YRa*NA_JH?MgJ!hc+vS968d|#os}JwUPK)OHV7%w3XNR{zIu-JcS7%+P5uzXILTuy8%9WA4Tx;a%CGJX`N-8$Z_{nka&~mm?Dl z>=Q@w7ZrEbqx^|RkW%9llzVDqGt<%PO(@ka8FS8vGab%04XN9Y0Z++L`zjuH(MTNA zsSvf-(Pbq~z zCf|?rJs{_Wo?+yDE;s8a;q-i1HZtBp;McmlEC)qdWt3t*=96e9X7jt_mk2RWvlN!z z%}w1CqlE0|e14g+(=<$*M0tFGM{7GC%9V7IeoWOMFYKV1LlAs;=SzO$_AZ`h7WU=sPm}g$;fB zs~vMIvG1X@PX{!Ft7TU2v#_NO~#KA z;*JbbQG+-lO|S^!i?ngs2L_~~UO0IUg_BX5rL4X;%&!yJp8&}CPyfWZqjvn=K%~x~ zcMG%+Bn>@cpH1ECf!3*G^h2bD8BA^+?z8%8s^1(8_eZBO&+Aw_j;pVlt(2?$I<+&& zvv=nPEL}aNgKs$W=&8`Q|DU1we~aHx4 zoE_NP8I;w(YBbm2Z{xDMw+zy7`j-*Lvv7`l?2X-%hi_o(VHIp>dz*p02wX?+zB^TJKPJ}Ho!Ut+qmEg<QV@7Zv`h$F&F1<_Ko^;7;@zgM*Yi$0-TMBVWzCK|2* zboBk$s6W#At$*Ls8K}c1={SHs(49)6V-03F%cOuw7k&ZVor6m#E-niX&-McTmZIEi z-R$k-{fJLY!p+Cm7i+Y!sk?o25jMw)*&Pw*2@~>6uvRho!#Go3DCbnB-_>+UKUX%- zv$dTCC3Sdx9DDJt#GCZ+==2muA*{TNN(r|fbz-?lNeM-Pi3{e%E59-M89`dnm5Cl& znht|#XZmHK;K5XI`@%j5rmj z$1^)4)1Fs*$6xu?`mP|~O{KBl_{_F0WoS_=IP$Xdxy@ZU$?_TL`rzJrIOfV&C+mil zhf~Fpqr=@82vRO?Hh+cGCXt|jAv;=B4y5iXakPj~gGtxdO5RVNf5oi+5kGCESi!t{ zFFi(X<@_SV@zZehc$qNidp3=Otw9(VPwKYg#_0?kg|ULh94_uM0qtT7?1)qRpl0c0 z_vbRN=937b9mW=2-zxy=Vc_qIepe3Pzk}L%o3%D2p6)-I)(ax3>kC`amEvy`$8PmM zg|prK6Tni)w{v0hiWh{F{nVAMO>_CcguY3IEF{Q{2nm^6R{d}A&nETt=dfEF8SyLG z5MRPAu3x7GB?vwbX%xo(+npoknW%w-Nc2yK4JTnfz=FK#lW|yH`{5U=8tw>ItuLli zo>)wx;pXoiz92!LdPAm+4d@XDiD~;umgiLAuxq*>l@COd15Hd*;!?6-ID33x%_LFy zHO(bcTU!4jI!ptjDXD+6?A6UmJhdh~k7*qacKiW74!$N*K>q4RxV|5N=RzAuc)%Db zN^mmw@ki`L8Lm%A*9xHX8qELL1O81oxHalRW3BP4q3;C)b{*Ge+(rr@=)TW~zj`wR zt?x3p(I$QQnQym@u1f>T$k}RylEaS7uYOufklT0HBE5SlaRD0RzAl(ybELNmk~sLa z5XXbS2hi&~vrJ1yIz9w%qHovla$uHPJMPOjTViz*ZnL%HI-A#>ZWdy6ci?ha$&9f{ zW_SJds~F{!AexkFvIc%o&hv~^(Vv7G4_}rzv#*9jzLbu7HL1sIyhJtfcmdj&LwQ2_ z!MNsjqBU+9=`X!s3PX8AU;oN^K;(Mb{HXTw#h`!|sJl54nn5fJWQ9Jq5L4Xp6)7t_BhTld$OED zZA88wETF`1J!Ar!%r1%D#i0x)R#NcaY_aNsOC1u2SbLmI{2R$ke%ZJcUAIQWT#mX3 zc6%nw{vBd12-CZr5SMV)=W}Az*K5RG)}6t*`mmy>y_KGc60yI}O-jGsx7`4(iSXZD z{-$3`jaZTLZ*36t&#A{)761{a+H;3VGM-0)h$&-h|J*D0H~Y9hxam%^!qwWvnc?p3 z^e_z%j8dzLf9r+T}GuZ*1j30y(f1@-&z_j>WEFgy6X zE3=!lR&41X;i84Av)A(MR z5sgR_3XsMeMSC=nW21k{)}oDX!UooB)0k$$dgQAmpb@E-+)LjG*niY55? zyTu|!!L}(n+2!p|dkMD2VtjSPPQ8@ta|P>={}H<-X`0GrhHAv zIMo9IQ<5amyM#?ii2cxgUNwu^m*U=^ZWtu(jnth#zP7(T^V!Nx^e>pfI{`{hDT%btjMj$kjptxv|D%2q53-BD*&lmcLz=vj)#}aACynM@ zb_zgkgK9z)T1X0zLQmlusJre7G@yo%BI+Ztw^U$YfOHHBi6v7gQbGEVqM9x|{PKw( z73hBYsFTZCYADgG&mb2w7GIrMep$;HxT<#W(_eAt27KELUMp1&mKU3fN27R*UQc{| zw&T8^l`Avy;_754b#|lPL$N-}snHL8yu=d`T3%*mvSi)!D4EsWSNC96u*hs>q|g?W zc~w(9mc^SPpx)|%{2|++mQHG43b9)HTR8&xuiEW0sk_PKep3lD)=3muX$X~Lx+kNI zA<0)mX&X4bkYH0)T(lB}P8U~){YgKFXIXbZbXuWY1#_};dXpImpME=$mBD7DPT0`S z0j&4+V=U<2Oj*=2CeUorBWOs7e?>6Y)UeCh`rHlUkCRBokVrDqo|Gd2b6i6y5?m9K z36|;(1Z)J)OZEw=+?1Icp>~T{Wx5aCTOGGhKZEsY*YHD5h0UL>_gIj9 zJRzf1gTJ(s$SjkJqg2K01h3&agR7C&47cw1e83vE5=zouE?E?oY&RBxJ3V zsHnH{w~w91Espzc^k2t(i1&>Z4fqv{rW3N z4A8xAxzE(>DJBSL!1vcnNHH8T#cL%m3%qoeUJb*fZM2^#6CBqtN_r*hQxcWC5|JTC z@%ay@K)H0Fg|r8N`< zA7n5X-WWk=Fc9FE^G?>vb$`sg5g2pDSaZ&pHOSjp25@cx*At+FJZS!M>w7qU9zq&jdJt^Y(djcH;kS|XzL(r}W z0}I)oEiPpLDs4w;|2{L)HYJ^0mwVO?M=0cl_*%m8mAOf_CIZ;ef@s#Dx}zzNG4}(# zxmJ%i?6ogZUB@gKqD=W?HdfS!*D**$4#jrIJ^XVlSs48`WtA1Ths1hP>W(5lxnu!4 z$Puj#iMeN}`B8zo3n+%ao*3LT)B(ZRI+OGUa2B*aHHD58Dv%KGM_c-%t#I44rj*QmdiD@8CZ=QwMSeSmr*r%WuW($Q9x+@*c0)0lq*z^ zi{1`15F;tkmwE?hyPhk0QDSj@3Z_(2(hRGGKobjjGQR3oozVCELYXm5Y{Hkp`L?>j zo3)sBfcgO1rWWwnbzq+y1nT5=!MMWf544z?#>OU7n=R|0zm?&mLbFXCg%nadgG#m5 z9mltuONffuuEcd+$Vjrf#$K3_a3lQq#Orv^6^`hWN+$ zX{~;{TVjMT3^ynVvu#D%%?ohCFfMF)m$Pt4S+TdlKc{eu4}hGJQ$Q&5OQ)+6jrUg z&i1i1359-s7RDgiAe(FdHLB}J_N}G?ytcV^u;ReeEM#M4TnVK{YLsqc|8L}i3bp=+ z@m=2vMFUD^Rjqr4AMwxL08&Y}PtC&;pLfnhZA&psjez`KJuOp;n$vMydhNfn3^QfCt)=DQ$Cp{YP^{#h za_Rzhg+~MEyK1A0uuvfGI|rm;IMg~h1vl2iRcsGO9A);|x)pPbz{=_zKb8gE@`<~} zukpBslmp$XshE#x$L8zUdF-jwyz80)jH!o<&%0r_DAd_i$T)ttPt@7Ixr(LhH4Qtq zK!V}tVu#Y2u&W^OX4?#RHE}7BgC8jtaAF;E>uKTtUi|sgu7oBFIQJIy@Wnbef&s6MyYTt|y1@0}uUh)Ca{{-o zC(N;a7e*IXemLQ2hG-W4&HlIq{(L&^J)cPh_HNt>w)xtYz6`5%z;VYG!o%<5E`j$G z4;0`z=wc^?t!I5N>+!+CqSN6`&a4)Y8`tLWb?WT~oB_Ic?9aG5x5{~*^80A05R3#u z`++?n{<)~3=#E!xE80{5<0?E~3948sJ}yY-)d}a8kE3 zAy9MN<-=Z_aA(U6j-7rxoUjPbY=3FZ{8w2WYU7HM{VboP!-%Xi{G0DqFNZ+nz)Vj7 zZ`yXmi2LBGm-kGm1CQ9V@aAMo52PiLqT=c_n@rSm(gD5Kc9Y{a$8+OK+{05}&55&+ z2-5XjTw)O@v72u!y{Rz=)7tFx|6XFwh6c2jqfS)F&&M`3InL7x!nq%KC(|kTspvFZ zpCSn?vnt3zOGN3EXyJ&9l?dWRw(Ms+ESFXf%}*uIrjPDueNc%Vn=l6$>4a z_eL9SV|0cc7>m^$&0Dn(hCS?KYTT69|8-`XyY6!>0@Uop(s|j4EtdK+?DE}|wM!_~ zsG4(L6oA$oIpy_qv_0j%jk4>(CmMY9dQWG%+6cHQfd|gUq)}T$t2O@$k!}_1QEeTH z3m#fn;tI$oy)x<-TVlsfCZ6IZMAJ_5-yaQ$n!+vnWt0B-%Mw(EukE`B7;vGKjoo~N z54jF`FFyj>3p32+Jb&BQt=g&#hNU+#AOHopp&D>Vdrz@=o(aKZ@|Zi$U`!q55)=!|H;eiPx>V|l6iuGxJ_1EmdP^jugt z`&wPD`8X@*;XNwo^U*71%byz=y`>e_QgaoZ1=rZOX;3+A0kj9)?{yfz017AGU(STI zW0i6;cSeO=hHpemuUFeWIz6-$h{oyr38NniHEhwBNTzx>!X+p+fJpKh3L;Cm?qqkD zKe*qPnvS0)9?N7V0b+h1bn{&&^RC>vdJ%F81|f}qSqB6;k^I{#`TDiC_zZI6Wf(64 zNwu2emr432CJ{7m_dO_8^$pwf-)tH5eH_2N|8{b>l^*E_RZ{qLa9}JyL=FaZ{qQhU zg-5n0jvE)!4D1MN8Q#pRmK+=9cZQJO&06jU0$@?o-ie*xy$~3$R{PJDu3OK2#hOth zPuG|Gjjrjq?>2oSkRG|BYr5AH1BQE6~iZ%9uEbq(#@ue#!z} zzH-G~#tah18HpW9ZSdDqUcUVn2u(n#G1!DxkY5cT4xgWGkaVW^^dSoB+Qa}4J6wY% zp_$(=%(y{QO_o!Q#MPN0_D&cp0-)dLQd8nr!{)sY;9wfE`OaoXQq~n*75g5DSA~PU z@~dW(KFUHRRjtsCenjW|U#&Ip`q^E?(+b+5SvBXO?)HRGq-fN(0;!YlTk+uXHoOf+ zs1_8fa6y(n?!mh)z2E4NTh%K*^jLnm_D_3daps^jzRr73b&?@9+YUMd{fKNUcjJL9nHeD}bfBGdFMJDHAM(TMdooc&QoX|w4gXWP{gd$l zIkGiY6mIotaQ5!+eZ{hkDZzFpN1{vO`Pn;-&&-@zMTZyUhU>%909Y(kl{lFruIX2XJ5$zg* zd@aU>7^MA6M*8%NfqQFRV5eR-%gzvF8GX&*eoJ*5*Ky9aSS3>(o*%OryWfVG{tK06f17KdeIe}Z*%LG^ zTz)ZWxEJtv64bEZ`*32sr+(dyw_P88X8O1qz1kx&M%3(yVX%S2U-U1cJ*njPnMome zTvkv}_F!aGEWthr#2|Ym$m>=MT6yebvPMs`t@dU#x03zcAdue7&e$(aCvFsyWuFML zrPn?TC(WPayxXY?Ao@@h_Sbv|-EWM?*Z?oyy@di~m?X%&?h-WlWYvQFBFt*r8=cpd zKC)E6CNLNZ7}oq%q^53?SfVG8=Mjwy%IxQv;*BOl7}ySD1G*FWmQPF|Dh>Bi*wF>u zssOR?QGRehrb9(|{d1dc@)+D7uPHS+^YLz_*$v{R;uj!|7x>1Dwi&dCKMo2aYo>5CMQikjuiDNXAw3U~IWVf$x$hqEK$UH%`hlpq_!EN182 zki?fBtT$^&MZnOFi)%lJ?Ka`))9kl6vsY!M3AhWfs<|FDnqqqf#*-zYi%}S-#P`;} z7BQ=y>hjJSuZvxyx~wfe0(#8elC19{yNiG3{_IKXSlHIzB==Tppyv5@YEdo>RCp2yqkl3w$Z zsrHK(C?r*G>yk57CkzHz^@WSDJCM9@657b@VWYv`Y#Uy?LfWh zQU#i54-68YCQRNS3LoUU*qbCAc(M1=<#Ll zf9p(W+D$kKA8aHm+2GXFA;A!1AYGt`uy@IevOL78?X;u$ndBTb%6@S7kI0cPk~8lB zBktSb_KhG*?W~xAFBx{#U+5Xu33vpp!PYz+VX1tsu`!z&{(&^v-C%k|Z3bw0I+|fh zwwB%7b+{c`J8UW+UU4H0>0BQ7zIb(|H=E`Bvj!o&9sjEtZhko#+yin8GQgQexKm5l`9 zHS?S(^8Ii%6pg4rGR}lABz~pNV<%pwg_(S53gY_YOn3v5E_4PSd2z|p&B~0|pK3R+ za+(oZP^@KSCR6xtUK1gDv>`^CFpPK-1Bd?@c1XY(pDe-nv|Ez$D%+%~NDE0@<)tfy zFk@C>eYZxlt|DNaN6oxoyCMq9Aq5&Ycc?1{!z4Vtc5+W^!o>GnL+=`NAs#G9DEM@w z4D5;`^+6&;o}|CME5BIX6;xDQA68CkcDXQZI3sQEorG*;wCr_7^Wx_*Q}!UIDbn@$ zJ}Q1YHB6+$|8n~qkkD+DKGVZN@-5ITv3%}U=2(Audk0^K`L%n8J4Me{_P4+Fz|od= zln8590ez-#jovKMa>B{c1$Q5ub8phOx6f%cOh{$V;AQK#PwgGqC*5~jF0(?bay{8Q ztZ;1<@_R9MYpcxH7iG$^mNCO_#y8@h<5GMN-G9F;OCjy8ZLzG#wrp=bYNWHFV9p$|_<45l9NO#vt3@C;^JEM%9pstYcekvh%r|pAE~M#KQex7ij?LK|YSCD4rDM%f zlC>h9J5;WRWxTe7sfUehe0cw84S#=CzUF1X(|8_TXpP=GfwEHwqI+e}ll4aly^jmA zjq!8F=wJ1pv{EWoQbdVI5%0hAzUHpVirK&v-B+qu%~c{=c<(Y}!+@_WZ@ zRerMN{S&s-sNKx(oXFEo9}_ble}5@`$Vk;-rihnP3wE{dPib6WhjShV%!76{2xFR; zMt|k}Aaa{We6ikBGF8NrI%YS?s5Uo%`7_)6C%0Rh)B#VKu&93Z@pq5es8!FXhP!Tn z_F-LYwV0I^hjm%NVLck!>~55hniMY$ps6S_lKsQr%?OuQam4p^#gg9lgtc`Z9q;DU z$Fx>p*gZlF6%cijL!5oRP-%xJx;CBbgw|a^7Fo)wDQd=?=c_d2<}$M~aRt-k_oEhS zWMj!q&FYtLVNZ#~a-ve0l_}gWz;PCwa?!7@#*$nR9U*7m@!zX3W!B6Z>Lya>}U*2N-b?!P-;fLv<}buLHkg3 z?;;3Z9`>T&%Dc{T>ZNE&CeK!K+3YP!JaDhImGYhj$TeieThZ`y)I`=ay|A`nD?D4} z%i5Lm9wS4fXxS%D@#X(4=B;rM%K0jbwUHe9-cw1H5csWC_9MYj&U{S98&1{#fWfZZ zW6Z#8={bBXxTIsX;%BS{s@x(u4Z-VdAKi;QgXNmcfpV@_saw+c-_C{0h3gTs%!LFg z8*V%aC^g@okmmVrGv3aI^kVdfp?Z3B6bYjibVJKANyp6Ch3TrVojxnH4VAbrSA`p7Vn5ua_=T47Gs<$eFUXe z-B^cuqNap=gGxK+(-uAsF+=fG$VO-mex8N=u&?erF{bi+F8sH5Dg^Cy!z{paNJ2jX zf!r_wl+QhHOQQGXH^lZ9!24v}&~z-|1K|B`as~hD{@M|y6FlT}adpl_5cGZrj(H3c zm=X%%P!Q^SznpM%W0ybbyt-Eqr9cs~5Z|kPaE$%?uJ4F5C8$s~ueLgbZU1K5BeeWI z`(oLz2FI05g-?-|Px*$Az5UMTg}5inJ$Wjq%l1%T1K_+@!9vi(2VSR10DBi|d<4vc z2p9{Tcwla;c5=YJdsl8NJzS{sPw&g`YgZTC%g%e(AJ^SG;(J#XHq<*7MlK-I|A74? z+ZK|Spv?*!W8XX7CnTRjcGrJ=D+P7jfcVfq z$r+kYAF%&J@c2KFIm$j%1U~Cn{&7r3$v>Hx$Fh1GJ~v;OS3p}b(_6Y1^5EIvfalb3 z$p71#yKIJT`8~gC3?s=P3pV5%M@X|_U;hUI(UQ+&$Run^**PIRd{q`TQB%#_VW1vg zc>iMXpS~biq>m)W+8o-Z3=|3$Ka?8!PoaeBwjm!Fmx|%DxL_y;t^J>L;{V{BqgSqF zs}SAXoV?M{UB>ZD9rWR3PJDho8NGfd-T}7R$mmqxn(LX8ie7AIr_Z-M?m*wy2f*qq zf=n7?ct-~z6f`uUGE{v-L!A)-uF-yzU|Qe{*?v`CiQCFQ288QW8k~*(@uCB%@0Ni zSy`}A{_PQiScjWkVtR5&eLbsls{c)->4OoovWkMy)s}C>7qsEE`?FtImtiHky1I_% zPflOia&@iRBKk;DY4kcqTal=_{zer4tp8p258$Fe*J#!u zmeRnOPrjg9omgGp+==~`Mr^j<4Abjy+o@{V>MK{V(roZTSyR-iH0VhTI)YfL)C{dHRi|rz49lJDX_d^c2zG|Kmk0fS>sk?Gi zeaJ*{V18uG%P=uuGCa}H(!xbsMph*I*+)Pll<@>%S1T;29qyMBrl#NOS=nM{o~{c= zJEz4Adll46LJh(VzCwtL4=#)~wVv0@)On0zswUE4Vq(tz^+XzQjbPK(rP}Nb1Y=Ce z3nl)ZvwYlRzq0Q>Q0KonHW`^5?VDXtMecr57IVLh8U^}>Bh_@EnpC$DAaUA?NT_{( z$d>98a0iACy!j;8?lHU_DXD2i7MW&V7W6HwESh%mTc3BOb9kfjZ}3}z?3k(C(-?Dh zwvQw1*4@eCjRs;A1fG#!yNLW><_pugK|CfW2rq1URqd{Wcb-VhmK&O1RkVB*9l=qc9U?jwiZelW4 zBtq{F!;`iZcIrL$;0lxJ-@M*-Me=#252P`}!o%S=12HKJ#>eSsa&oNPc>ezT7exRZ z)kW0N(Xnal*&V5w_g27^np|R_5pYHHtG1^x!($q%Dycfo?}4_ABd-qTyQM3+-%inh z7)Ib)!tL$%M=t+5$FcB+F-egiio%pu<>|tHbMjD}0OE5$30g>^PL#x*1`0tGkoO(N zZ^O(R02?gOe4i}2>zEdS`1ttuPA5>RY?~A%*aQUDJ|7z!x~7Nwe|4rwMvAr-xN`%_ zmRO1ixo#ivvXsVg6Vy%3*d5m4%m%-6|Keh67e|5_9NNX(grT03ak)Jlx)Z8*VA_3W zCXrW~y~nO|+f>GuCje5~@{_~b@#s3~x?M@Q=ya&fG34N4%<@H=w(I9DcvL>YS zMeXtMitM_>@~6M#&$O*G%=1R0QY_!)E`gX_SXwkih!yLKo$WLDR+Z3fg*N)^u%@po zpTf2nI3+K)k!)bQE8S(+qo5D-iD+g>F^Ykoh@UVd`2DLhQ-ebH$7-#`w0%433mu0= z_)X?50PWE4V$JoA(xCs$q1!AM);I5d(|x7R!~#Hw^s&6sYzMFJdEt=$y2sz{{jstf z^s<6FJG7dRXYk=p@PW^M?|x?q#tVdt1A6SqvE3>vDq3!V>1N1&tE+>1i$ZUQd6pqZ zdZr6%+~z|NBBG<&RDK#?KRtQ0wzchKX#wk?{~XQ_6L(qm4$Vo_Sj-wv1iu|I>_Y`Ymm>X#k;V7G2K`EsAmyaUJPTQR6}X?jeK zi!7mNiQb*TgA^}#gN})XRcm}8t!*>C*9}i1R--TA2>GgQ!}IN@v=z&c@=torR=h$< zWUl$Zxwo?=ho86)&bNn?v87Ys#6pJaLIbAIdfQdRO8qW4 zKfm{Lpd-C*C*E+}_w54st6`wf6NG}${kMn25;4Be&_T^+mvRi`n$DMY{1BWfO>Vxa z9}w(+@)8q6g<^Wp!LRqCW1}kRoObOjptU9&i=sC5IM}CGqt~g;Pi-30OsX>1cOKY= zBcFIACZg~QJ~zXQ)28z$ldm|FbuQ0{s_JTmF_*phfzY5`nHmv)dtMr5^wqYu?jQAKpDuGqWo7yFL4A@-k_ZaGRS~v& zLYhNKps#ONU0jf;jd-RcS7|^^NhmWCu42Ub{8;aZx#i|RY=#MX7Mzfl7Th>uA&KZ3)ZiC1W|cUOCJyt(vYHyEm%9_YrM%C7mxBBULV+;3)U=eS zqv4nK$7Ac`!lGIj0q=X;vVu_m060hnP4to0Pi5x6FzPHjU^lq=-#=+*|Kkyi+pO$t zTffKiRTDZoCa%X%x<6ab-HzSK3I5^xKx9($pFfy7Zf?z?i@zobg0=fNOw)|(TUQ{Ta< z`r)}Lp`@v%R7j`Y_D4Fv_(Mi6gSc4*2wmUn~mL#G@pubm=@*E zdszW|L9|d%I62;u{}@xbGeUFkei~$fu)FrWU(`5h$P@F^htzJ0;>d{nxgTHghmg~j zJy@u&j^!|mQ5ZXghpB;uo4Y7CQnPt1N?Jt|`|k4fGB`XOSDVG&%Bs9Pz1xae($nYV z#A4|D@$Qb<>a>HF0SFpybDqrK(SPxG=mE#A#wbftq}KfC`;(h%KbQEL&pF#+ohr8n zx8C9)sq~8_m0n9)$US;B{o8I$+)lEM7+Sf@;aczm~C*@hyx6!9$I- z(<(Wo3XK{G)nvHuc2u^$!>h8qZX!C3mUYiLLjHnRxU&;elHsUMczh2c$OS(xua2Ti zbKmK}+r`IoZAgdHS@FajZVh^Zed4%!A!t;-A>F>O={N5k9!Hz*a_eQJ)N+|(Z0n@L z!ht0lSzLBQhgCWl*!Yuvl?E^+#r@it)ep5;U5vu_YTitG?J?l9(Z_qJcF^O{0NrtS z=0%yKv_S=`YxxJZ1oTLQ#k9m{zdKDsPKr~-L6Er*zQMXE^Sy`?fBr%srJ841DGf|+ z@fFU(A(G2_;IJ|dFOMHeOwP0H_M1)W`H^*_sQ=)i=1@GJzI8Cy`94Ib*<{_Mg@^6W zzGWMt%ssgL9;)u*`<08Ddc0NbQ%EjblUuFk=B=&G{C#c7oYwBo+*8(r3{6h)xVj;s z6*u=cJVH+Uep~~2VOv=lw&Q;&8rp9Dg50GLH1zZXFRp~wz2L_JE?+xb>ud+}NfBk^ zzo0I2zuQPVWw?gZBLcc z|CeIFwoU?>(7bWC(Q4lDe1!$@mdU9rHT18%g2fx1-J=|JAvy7ZO#a}T_D!YI%_m4Q9*W7h^H(qi?S40R(N-}zb+ogdYb>jY zXA*opXQP}McspBmV0LOWxiOXgSDy8y)oeVK#&}huS~qf>sb?c5!_xC!A8~@EP|{@a zRPA`Pg6rXYD_W^gGT$T|^=bch8Z@DzKyZkwUac)o>Xp#;;a*_2v38y^+N|XL2d&Nz zK;=}DF{otl{rmS_n)TG`K>Q1|FUY+bDm_EFUEUb}$IK52qmA$WJN@D`3_k|@2cyw3 zFzn~%%DOneTH)sRw-nao4L@i%T1G07t$|z8p_D|FI}na0bs!WlolB>3u2+u?4387Y zON0h|2r3s)#bfDwL4#i_by{*`>Txgj)^^&CsCSPO3oU2sVNpqVBa(`yMd`DW!(RSZ zC1?HBgd4VTLAsQXMmi6COMMPPKRG$=4ahKPVPLq(({ z-hI#e^ZUd52R!FI=RD^;=en=!zCK~gZTMfmjs_AmLQX(VRwImAzS~^wg-r=&Okb_p zhC87Tt5=hgbthegDS2q6y_fs4U>%dvwWx+xexsCz-;^?!gn7&{HXxaKYAS8%aHXrz zrg0DojrZFRwg>_cZ!BA`@~(wcSaN_|#!Ww*6|xywb14UX{qZU%i&mIQn6?69Y>07L z_Base=UI*AmJ6ycKw`glRmRW~5xo%lCbh;Q5miy4%1=?H9^CvL{RC=On^q!SsD>*j zsq|1WgQf5fi-}2AFr@^BLC+@$f!`=_#&OU3%L$qmkMWULbMrM$nz{V91Gd6^Vv|sB zj=nq`8T+o}3{K^N(>@1+{ENU%6qJ-8-J+Ps(UVM7Wb7kp95EA8-5<3ag@xT(Wc9e3 z0=B)WUOx`m86xYD6tthf z0)Ku>O<8^KRhR3U6~_w|DX$u=4_pXPDi|qwkHBtCC6GcIMX?$NIuZf79&nn%qUH7~&8Q#6;LYxr)3}PRN zXMgv!X@lozCKb;Hu7C^i0;F#pao zwA8|)N_0M`Z%9?W^98nO-zKM%UVabasCqN1r1TASL%sA92T+Q*48aOLmSm0f5=$Z{ z^txdNo>9LB2`UI}7v@2hH^f6Y28?zZ%XHb9sZ=l?6WMF;=s0n;_EQVVZ#u4ef@?=4 z=t}lcEXj8|AGQJ%QgatB;^_G`@{bfC^6Q0ljeuzV-`H8ebh%UU%=cza=W{_}A^ryn z)R7&&k8&0hIu4F>n_i6bM49J^bIn9$f2cO%U*q0M=hsYgxQui1dr4f!ziS!Jt~Jf}Iu@{TmOG0?4672;NN100Uk$(G=gecKZ*0lmS%bzXD>77$d6k;*lT0_?rPsbS z^;$83`6Wj|KJ6&k!|~yiBt(+0j#I(Q%zQqfmAc|t=hwKBQJ`8G`?+n8Gre9tK6_z> zrb%%200%8cAEi~=7d|>?G+tY#kyI!HHjp8L9QC>JT8VPnBh~J zD%Mhww0if&ojQ&_>v$JC@(0d;@DhvRvF)o(U?SR5o}e6zx#v;{o+h z?tw*G@;GMSP!~TcaMZx(_?RJ6$FMo?$BOWFs&Eb9bUM=1=6TT17vcaL>cUP(`x08i zfe03UVPUl4K>T!xh|_9ld0BN-?o*a9O?-hCZ%qLA(_$N#VzTxRoMxF#1ulW-k2Nso zRe=kz70j;PqOJy%wN7T4wQ~O)#MCq~)MY=-Q%Nr?+5zE6Elf7^(xluNxgVOpFlI5_ z%%&7ZQ4p#a7fx=%GLwsW2B!jB2CPmO((*I}SE<|$XtZg`J+#l6Il&R_Z)}VldA3u* z`fIN8r0hH}9T!VOQ>0w$5LURy^(E;Sy$AX_TEIGwUv8z(zfovgikg{$s?6LvIy(xk zXmmYf;Cjh8^DqG@ztv3JGs|V0FG0H)K#My)4F_bd!=pI^jQHv`E|&b8sp03hAF`I- zYo@BW{9q;KD2fxg7=m=qw=0B$b(8~ctP?a|?@9t6m2hCHyns>&xOt-t|5oz0^! zq=bm6etSV&>8~+#DfsJEUpZS2o>rD~JDKncEbU~1A6)?mBZT|wB2^|Prs4?HvCJu- zl0t1Ad{XXWw+(`;jz%F59CoNnBkVp%2OX@me>R^8Ar4ph11Hz3itjF6hwqKNPK)?! zTLxFrs0aM4R-f$7&SuI&p;CbN2qGe)D`oC|+sBV;CQ27ILOu4S0As5zQ}9LV*2>|^ z^T|g0b(x=5TfsZO*5cd$x|p*C)?7c!jWYZv`g!=>wu9!z$)!)UWR`@KRR7xUYRtv{ z&WzL6^<2$GjUqHWV3@ipXni`yrZIg&4$-j-GTY zuw;*&Ugs`Y@iKKv2a z&D$#MTzRY2GPhbkL^4*96SSPQ>v_YaNvFS99ho-*3C2o_wTIc#lI(R)hD)#EG zDmHzXwE7k7;;2;43#X-O{qkG*vx0fwmzDIkw70h2MX=jvE7!gda5(HakGBNzm;1-Y z#}>TW@Fy^y8WZ2<(p0wT=<3SFsPZ`m1I+pMtBoT6B{uCF7xt z__7A%>Y&W5?*ucZJ91#$*{?MSoXaCCnUA*uh_8vP+Nx^4RB@7eDa<~%6kB*P8%oL{ z=aC{hIK8>XpPC6mYQw~Wu-r=ntwN^N_`enpIqjsxB1JB=QT7UwUHjIeOw+T|+8cn5h| zrOUXxko&G=oavlOjug6;GbiV!IR+oa>K?0(C%Y_Cw2(1@+y!}SP%%-nxuhvvUT8KcqQji zdnfv*{=bJ@Mcc*ScK__cPKLTQmw@0rsh*mruN6mqU0i)@Gl?x1;!vKTcW(-qTZ2m@ z!tbh(HQL=sk5Aj^HGHCrlDD7x25x2+crYz&V~xZyX3kfiTBYP|96ey<&ET_> z0}u8dyHzC=Z)HVc0dm=qK{5Hud^vvX%#oG;zrSo)##uzB8e)N!I{CEdt!hztMaV)x zs*isIE5ws4WEE;%Cbot#Oj+90JGQHWOh5G9$3IVCsb2^@l6d%Pw^lb)cL3wmg;;I{ zsq#J)`9dLf@mgb_j&+**`R+Omr*sChcD1%O@yCe~4kKs#&GIZnP8u`7y37(I4~-ulqs-%Dkxbse^K8=*@7Zaj)sHBb5>*=_WScG0PuJ;k`%S}O6Um@fzQa~@iD;UL=Xpr0~D);hSjhBI$ZpJi2mh0Q!f+|;^{r=!GXcH1ijzkDk!oFh9BKCa<~rO}{;v;PzsN04 z&XP&HKzdBK&qNzy>j%N;nB6=)bPYP$ZEQ^p<*%&pjiv!a4yv}iwRNj&uxwfIdOxEt zArq5(e=kFuE3q0sjN7TykGPIwnWBUu#=;Iup(%b3jfuUqk^3TlMMw|dsKOUR3u z^RcFw;jpPakn1v9ztsrsv5amLgD+NGTfNfvproRRy`x~({V7+%E7DZ!L(9IN z+hNjOrSs!}KM4;aiWNQ|kj%ZvRA~EI7KL+M=rDNwhz$usy3cprBUPbq^ID3JxcXOk zIe$)NVP#ClwN=d(cu^_H2bRfdV-P;FK$GEdoy(lX$QZdWCXZwB zKl?@B@<7@IF?h0Y_PmwgHe%qAoJ@J zZ#YMFA~Z!}E61Gax3il|LHxXEKcvOr^g`C4rbgIEWvae1RbXm=&vREh(jZH56O^o@C8v-Yyb5GIJm)6F^k zwW1NsmEo8GJsYD?EJ?4HUB^f~Ns+4ETkJ1wqRB>m@>y;}7A+-JE2H={vZrSb!ITUE zsBdo-M<+i4wpsQJP)XASUYXdR?HNpy#v~#kCW`=^3IBI2Bg~iFgo;#L`O%#m4Yl?l z@It1*i#&itx^Hvduq)ECvNC(~nK9`moWZ{}q~V(r?B{&>+qm9`{&QM?MnG2W9z&4V z(uKB!O(!f_i@F6zbd?d(J0{Cr2j@x&=(mX$1xLVT702p-AAj3_&RV* z=XCbVA-<-y#2O)1-6)XnGPPSgd)6*2@Ye(irkz~<5{;K1eMVgVF5gUUs zRucy#pl3HTQm}RLF5(hW;67=@>ANF({uaX7Ie~e4t53{qibfBZ0 z>3d#6E)oo7uU8#0%kpdeU?fZ%c{%2$f!$NdR3O%JLlVT3`Klh)_Qb?85*W+6N zK7wUeqA=vGb-&pGp9&j#3u*HiBGx)!|%cMf!$hvK5LIiT-m zkW>8Gnbsph=%(^)^f4Ls==2nXuT%cp>N(FIFl;wImPNRxy&)H@BH%I*T!m9{?LB%J zAwDR=iw9p8Vrs8v=u|CVFfjE+D7ki@o(A-x-<%f6pZz5mg=(cr;Zrr!kbf$Xomy^| zeJ8j&R7phszsmEN8hQCpjql!Nj+y(ym%B;IMJ?x(;JppXVFZm3eqzY5)48A+VrI9x z$`|1$n<2(~qi-#L&orWFvUeDbt~QzP0Bi(GnV6v4M0yf+RzCt~-|@kQ@&=|~aLtPF z1wvzaS8X;Z_N1>Yn-cDa-8w&b!r~b|l*LxPQNbO4yoTLu^5MSM1ThSN?^i=?U5qX= z&j5oHTN!tmiAI-<-C$*N0f_q>;uU^G5yt z3V~Fw#w)w_?KM4!DEVw4yqbuzFxF8_7UjgLrzWMgO!E8MLmd<3bnkeL+dMS9JjBsI z2l|;o{{nSZ+c=xG)Yy9zz32Q%wJbh2*E}Rg9WN*cb(0a1+H3%3qh>7>vlDt@NF?~f zD4^ip&DE*$fSHAj2Tt3=(@X6yBfp4kNJzeD9n4$wHf7@F<)eV{22533N;0z=BgC(e z)@e4Uvw;zfN7R9hi)!qgJxI>TS*Z;dif7x~a}JR;bhSl$JauN|8MRW*yB-`Z@7*ue zc55a*8Cy%oC8=v7-0a~Cl@)zH`Di+@krJ*;qRxJEe7<9MizD@(K$hKRUZ#=t9_Q_u zmaoj_BUKHjwV!?H`v Date: Tue, 9 Jul 2019 14:55:28 -0700 Subject: [PATCH 09/16] Formatting cleanup --- docs/tutorial.rst | 7 +- docs/tutorial/network_models.rst | 4 +- docs/tutorial/simulation_basics.rst | 198 ++++++++++++++-------------- docs/tutorial/workflow.rst | 4 +- 4 files changed, 109 insertions(+), 104 deletions(-) diff --git a/docs/tutorial.rst b/docs/tutorial.rst index cc320c1..f26fc3c 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -23,9 +23,10 @@ Interconnects (HOTI). Aug 28, 2017, Santa Clara, CA, USA. .. rubric:: Guides These guides will give some of the basics needed to use TraceR. -1. :ref:`tutorial/network_models` -2. :ref:`tutorial/simulation_basics` -3. :ref:`tutorial/workflow` + + 1. :ref:`tutorial/network_models` + 2. :ref:`tutorial/simulation_basics` + 3. :ref:`tutorial/workflow` Full contents: diff --git a/docs/tutorial/network_models.rst b/docs/tutorial/network_models.rst index 6456712..be071c5 100644 --- a/docs/tutorial/network_models.rst +++ b/docs/tutorial/network_models.rst @@ -37,7 +37,7 @@ Configuring ^^^^^^^^^^^ Consider this Simplenet configuration file that can be -found in ``codes/tests/conf/modelnet-test.conf``:: +found in *codes/tests/conf/modelnet-test.conf*:: LPGROUPS { @@ -106,7 +106,7 @@ Configuring ^^^^^^^^^^^ Consider this example configuration that can be found with the -CODES source, ``codes/src/network-workloads/dragonfly-custom``:: +CODES source, *codes/src/network-workloads/dragonfly-custom*:: LPGROUPS { diff --git a/docs/tutorial/simulation_basics.rst b/docs/tutorial/simulation_basics.rst index befb582..009ad62 100644 --- a/docs/tutorial/simulation_basics.rst +++ b/docs/tutorial/simulation_basics.rst @@ -5,120 +5,124 @@ Four Steps to Simulations Creating a network simulation can be broken down into 4 steps: +.. contents:: + :depth: 1 + :local: + 1. Prototype the system design ------------------------------ -An overview of setup using network parameters was given -in :ref:`tutorial-network-models`. + An overview of setup using network parameters was given + in the :ref:`tutorial-network-models` guide. -2. Workload selection +#. Workload selection --------------------- -There are two types of workloads that can be used in a simulation, -synthetic workloads and HPC application traces. + There are two types of workloads that can be used in a simulation, + synthetic workloads and HPC application traces. -Synthetic Workloads -^^^^^^^^^^^^^^^^^^^ + Synthetic Workloads + ^^^^^^^^^^^^^^^^^^^ -Synthetic workloads follow specific communication patterns with a -constant injection rate. Often they are used to stress the network -topology to identify best and worst case performance. Examples of -synthetic workloads include uniform random, all to all, bisection -pairing, and bit permutation. These workloads don't require simulation -of MPI operations, and could be used to generate background traffic -that can simulate interference with an application trace caused by -a production HPC system having a significant fraction of network nodes -being occupied. + Synthetic workloads follow specific communication patterns with a + constant injection rate. Often they are used to stress the network + topology to identify best and worst case performance. Examples of + synthetic workloads include uniform random, all to all, bisection + pairing, and bit permutation. These workloads don't require simulation + of MPI operations, and could be used to generate background traffic + that can simulate interference with an application trace caused by + a production HPC system having a significant fraction of network nodes + being occupied. -**Uniform Random**: A network node is equally likely to send to any other -network node (traffic distributed throughout the network). + **Uniform Random**: A network node is equally likely to send to any other + network node (traffic distributed throughout the network). -**All to All**: Each network node communicates with all other network nodes. + **All to All**: Each network node communicates with all other network nodes. -**Nearest Neighbor**: A network node communicates with nearby network nodes -(or the ones that are at minimal number of hops). + **Nearest Neighbor**: A network node communicates with nearby network nodes + (or the ones that are at minimal number of hops). -**Permutation Traffic**: Source node sends all traffic to a single destination -based on a permutation matrix. + **Permutation Traffic**: Source node sends all traffic to a single destination + based on a permutation matrix. -**Bisection Pairing**: Node 0 communicates with Node 'n', Node 1 with 'n-1', -and so on. + **Bisection Pairing**: Node 0 communicates with Node 'n', Node 1 with 'n-1', + and so on. -HPC Application Traces -^^^^^^^^^^^^^^^^^^^^^^ + HPC Application Traces + ^^^^^^^^^^^^^^^^^^^^^^ -Application traces are captured by running an MPI program. They are -useful for network performance prediction of production HPC applications. -Trace sizes can be large for long running or communication intensive -applications, but they have the potential to capture computation-communication -interplay. These workloads require accurate simulation of MPI operations, and -simulation results can be complex to analyze. + Application traces are captured by running an MPI program. They are + useful for network performance prediction of production HPC applications. + Trace sizes can be large for long running or communication intensive + applications, but they have the potential to capture computation-communication + interplay. These workloads require accurate simulation of MPI operations, and + simulation results can be complex to analyze. -3. Workload creation +#. Workload creation -------------------- -A workload can be created by capturing application traces from -running an MPI program. Options for capturing a trace include -using DUMPI, :ref:`userguide-score-p`, and :ref:`userguide-bigsim`. - -Information in a Typical Trace -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A typical trace captured (e.g. in DUMPI, OTF2, BigSim) for an -MPI program contains information on the operations that occur -at different times with critical information for the operation. -The table below gives an example of a typical trace. - -=========================== ================ ========================================================= -Time stamp, t (rounded off) Operation type Operation data (only critical information is highlighted) -=========================== ================ ========================================================= -t = 10 MPI_Bcast root, size of bcast, communicator -t = 10.5 MPI_Irecv source, tag, communicator, req ID -t = 10.51 user_computation optional region name - "boundary updates" -t = 12.51 MPI_Isend dest, tag, communicator, req ID -t = 12.53 user_computation optional region name - "core updates" -t = 22.53 MPI_Waitall req IDs -t = 25 MPI_Barrier communicator -=========================== ================ ========================================================= - -Effect of Replaying Traces -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -As shown in the table below, replaying a trace can result in -different results from the original run due to different configurations -resulting in operations taking more or less time to run. - -==================== ================= =============== ============ ================ -Original time stamps Original duration New time stamps New duration Operation type -==================== ================= =============== ============ ================ -10 0.5 10 0.2 MPI_Bcast -10.5 0.01 10.2 0.01 MPI_Irecv -10.51 2 10.21 2 user_computation -12.51 0.02 12.21 0.02 MPI_Isend -12.53 10 12.23 10 user_computation -22.53 2.47 22.23 0.03 MPI_Waitall -25 1 22.26 1.7 MPI_Barrier -==================== ================= =============== ============ ================ - -In addition to the affect of the network configuration, different trace -formats may result in different results. - -As an example, DUMPI stores all the information passed to MPI calls. The -simulation then decides which request to fulfill, allowing accurate resolution -for the target systems. If the control flow of the program can change -significantly due to the ordering of operations, then simulations are not -entirely correct. - -On the other hand, OTF2 stores only the information that is used (e.g. which -request was satisfied). This accurately mimics the control flow of the trace -run, but does not accurately represent execution for the target system. - -These differences are artifacts of leveraging existing tools not originally -intended for Parallel Discrete Event Simulation (PDES). - -4. Execution + A workload can be created by capturing application traces from + running an MPI program. Options for capturing a trace include + using DUMPI, :ref:`userguide-score-p`, and :ref:`userguide-bigsim`. + + Information in a Typical Trace + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + A typical trace captured (e.g. in DUMPI, OTF2, BigSim) for an + MPI program contains information on the operations that occur + at different times with critical information for the operation. + The table below gives an example of a typical trace. + + =========================== ================ ========================================================= + Time stamp, t (rounded off) Operation type Operation data (only critical information is highlighted) + =========================== ================ ========================================================= + t = 10 MPI_Bcast root, size of bcast, communicator + t = 10.5 MPI_Irecv source, tag, communicator, req ID + t = 10.51 user_computation optional region name - "boundary updates" + t = 12.51 MPI_Isend dest, tag, communicator, req ID + t = 12.53 user_computation optional region name - "core updates" + t = 22.53 MPI_Waitall req IDs + t = 25 MPI_Barrier communicator + =========================== ================ ========================================================= + + Effect of Replaying Traces + ^^^^^^^^^^^^^^^^^^^^^^^^^^ + + As shown in the table below, replaying a trace can result in + different results from the original run due to different configurations + resulting in operations taking more or less time to run. + + ==================== ================= =============== ============ ================ + Original time stamps Original duration New time stamps New duration Operation type + ==================== ================= =============== ============ ================ + 10 0.5 10 0.2 MPI_Bcast + 10.5 0.01 10.2 0.01 MPI_Irecv + 10.51 2 10.21 2 user_computation + 12.51 0.02 12.21 0.02 MPI_Isend + 12.53 10 12.23 10 user_computation + 22.53 2.47 22.23 0.03 MPI_Waitall + 25 1 22.26 1.7 MPI_Barrier + ==================== ================= =============== ============ ================ + + In addition to the affect of the network configuration, different trace + formats may result in different results. + + As an example, DUMPI stores all the information passed to MPI calls. The + simulation then decides which request to fulfill, allowing accurate resolution + for the target systems. If the control flow of the program can change + significantly due to the ordering of operations, then simulations are not + entirely correct. + + On the other hand, OTF2 stores only the information that is used (e.g. which + request was satisfied). This accurately mimics the control flow of the trace + run, but does not accurately represent execution for the target system. + + These differences are artifacts of leveraging existing tools not originally + intended for Parallel Discrete Event Simulation (PDES). + +#. Execution ------------ -The user guide :ref:`userguide-quickstart` section shows the -arguments taken by TraceR and some of the options available -to control execution of a simulation. + The user guide :ref:`userguide-quickstart` section shows the + arguments taken by TraceR and some of the options available + to control execution of a simulation. diff --git a/docs/tutorial/workflow.rst b/docs/tutorial/workflow.rst index a7bc29f..c0441c5 100644 --- a/docs/tutorial/workflow.rst +++ b/docs/tutorial/workflow.rst @@ -22,8 +22,8 @@ is used for simulating the communication on the network. shows how a program would be compiled to generate BigSim traces, and the stencil4d-otf example shows how to compile a program for generating OTF2 traces. - .. note:: - If you're using BigSim, avoid global variables in your MPI application so that it can be run with virtualization. + .. note:: + If you're using BigSim, avoid global variables in your MPI application so that it can be run with virtualization. 2. Generate traces. For instructions on generating OTF2 traces, see the user guide section on using :ref:`userguide-score-p`, or for using BigSim traces see the section in From cc053ba571848114a32b77d353a40eec24e5a32e Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Tue, 9 Jul 2019 15:20:05 -0700 Subject: [PATCH 10/16] Some additional cleanup --- docs/tutorial.rst | 6 +- docs/tutorial/network_models.rst | 2 +- docs/tutorial/simulation_basics.rst | 194 ++++++++++++++-------------- 3 files changed, 101 insertions(+), 101 deletions(-) diff --git a/docs/tutorial.rst b/docs/tutorial.rst index f26fc3c..99009b7 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -24,9 +24,9 @@ Interconnects (HOTI). Aug 28, 2017, Santa Clara, CA, USA. These guides will give some of the basics needed to use TraceR. - 1. :ref:`tutorial/network_models` - 2. :ref:`tutorial/simulation_basics` - 3. :ref:`tutorial/workflow` + 1. :ref:`tutorial-network-models` + 2. :ref:`tutorial-simulation-basics` + 3. :ref:`tutorial-workflow` Full contents: diff --git a/docs/tutorial/network_models.rst b/docs/tutorial/network_models.rst index be071c5..d288af9 100644 --- a/docs/tutorial/network_models.rst +++ b/docs/tutorial/network_models.rst @@ -8,7 +8,7 @@ supported by TraceR, as presented in the HOTI 25 tutorial (slides 22-39). For a more detailed guide, see the CODES wiki pages on network models at https://github.com/codes-org/codes/wiki/codes-networks. Any commands/examples in this section are referring to files -included in the CODES git repository (not TraceR). +included in the `CODES git repository `_ (not TraceR). Overview -------- diff --git a/docs/tutorial/simulation_basics.rst b/docs/tutorial/simulation_basics.rst index 009ad62..7f5908d 100644 --- a/docs/tutorial/simulation_basics.rst +++ b/docs/tutorial/simulation_basics.rst @@ -12,117 +12,117 @@ Creating a network simulation can be broken down into 4 steps: 1. Prototype the system design ------------------------------ - An overview of setup using network parameters was given - in the :ref:`tutorial-network-models` guide. +An overview of setup using network parameters was given +in the :ref:`tutorial-network-models` guide. -#. Workload selection +2. Workload selection --------------------- - There are two types of workloads that can be used in a simulation, - synthetic workloads and HPC application traces. +There are two types of workloads that can be used in a simulation, +synthetic workloads and HPC application traces. - Synthetic Workloads - ^^^^^^^^^^^^^^^^^^^ +Synthetic Workloads +^^^^^^^^^^^^^^^^^^^ - Synthetic workloads follow specific communication patterns with a - constant injection rate. Often they are used to stress the network - topology to identify best and worst case performance. Examples of - synthetic workloads include uniform random, all to all, bisection - pairing, and bit permutation. These workloads don't require simulation - of MPI operations, and could be used to generate background traffic - that can simulate interference with an application trace caused by - a production HPC system having a significant fraction of network nodes - being occupied. +Synthetic workloads follow specific communication patterns with a +constant injection rate. Often they are used to stress the network +topology to identify best and worst case performance. Examples of +synthetic workloads include uniform random, all to all, bisection +pairing, and bit permutation. These workloads don't require simulation +of MPI operations, and could be used to generate background traffic +that can simulate interference with an application trace caused by +a production HPC system having a significant fraction of network nodes +being occupied. - **Uniform Random**: A network node is equally likely to send to any other - network node (traffic distributed throughout the network). +**Uniform Random**: A network node is equally likely to send to any other +network node (traffic distributed throughout the network). - **All to All**: Each network node communicates with all other network nodes. +**All to All**: Each network node communicates with all other network nodes. - **Nearest Neighbor**: A network node communicates with nearby network nodes - (or the ones that are at minimal number of hops). +**Nearest Neighbor**: A network node communicates with nearby network nodes +(or the ones that are at minimal number of hops). - **Permutation Traffic**: Source node sends all traffic to a single destination - based on a permutation matrix. +**Permutation Traffic**: Source node sends all traffic to a single destination +based on a permutation matrix. - **Bisection Pairing**: Node 0 communicates with Node 'n', Node 1 with 'n-1', - and so on. +**Bisection Pairing**: Node 0 communicates with Node 'n', Node 1 with 'n-1', +and so on. - HPC Application Traces - ^^^^^^^^^^^^^^^^^^^^^^ +HPC Application Traces +^^^^^^^^^^^^^^^^^^^^^^ - Application traces are captured by running an MPI program. They are - useful for network performance prediction of production HPC applications. - Trace sizes can be large for long running or communication intensive - applications, but they have the potential to capture computation-communication - interplay. These workloads require accurate simulation of MPI operations, and - simulation results can be complex to analyze. +Application traces are captured by running an MPI program. They are +useful for network performance prediction of production HPC applications. +Trace sizes can be large for long running or communication intensive +applications, but they have the potential to capture computation-communication +interplay. These workloads require accurate simulation of MPI operations, and +simulation results can be complex to analyze. -#. Workload creation +3. Workload creation -------------------- - A workload can be created by capturing application traces from - running an MPI program. Options for capturing a trace include - using DUMPI, :ref:`userguide-score-p`, and :ref:`userguide-bigsim`. - - Information in a Typical Trace - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - - A typical trace captured (e.g. in DUMPI, OTF2, BigSim) for an - MPI program contains information on the operations that occur - at different times with critical information for the operation. - The table below gives an example of a typical trace. - - =========================== ================ ========================================================= - Time stamp, t (rounded off) Operation type Operation data (only critical information is highlighted) - =========================== ================ ========================================================= - t = 10 MPI_Bcast root, size of bcast, communicator - t = 10.5 MPI_Irecv source, tag, communicator, req ID - t = 10.51 user_computation optional region name - "boundary updates" - t = 12.51 MPI_Isend dest, tag, communicator, req ID - t = 12.53 user_computation optional region name - "core updates" - t = 22.53 MPI_Waitall req IDs - t = 25 MPI_Barrier communicator - =========================== ================ ========================================================= - - Effect of Replaying Traces - ^^^^^^^^^^^^^^^^^^^^^^^^^^ - - As shown in the table below, replaying a trace can result in - different results from the original run due to different configurations - resulting in operations taking more or less time to run. - - ==================== ================= =============== ============ ================ - Original time stamps Original duration New time stamps New duration Operation type - ==================== ================= =============== ============ ================ - 10 0.5 10 0.2 MPI_Bcast - 10.5 0.01 10.2 0.01 MPI_Irecv - 10.51 2 10.21 2 user_computation - 12.51 0.02 12.21 0.02 MPI_Isend - 12.53 10 12.23 10 user_computation - 22.53 2.47 22.23 0.03 MPI_Waitall - 25 1 22.26 1.7 MPI_Barrier - ==================== ================= =============== ============ ================ - - In addition to the affect of the network configuration, different trace - formats may result in different results. - - As an example, DUMPI stores all the information passed to MPI calls. The - simulation then decides which request to fulfill, allowing accurate resolution - for the target systems. If the control flow of the program can change - significantly due to the ordering of operations, then simulations are not - entirely correct. - - On the other hand, OTF2 stores only the information that is used (e.g. which - request was satisfied). This accurately mimics the control flow of the trace - run, but does not accurately represent execution for the target system. - - These differences are artifacts of leveraging existing tools not originally - intended for Parallel Discrete Event Simulation (PDES). - -#. Execution +A workload can be created by capturing application traces from +running an MPI program. Options for capturing a trace include +using DUMPI, :ref:`userguide-score-p`, and :ref:`userguide-bigsim`. + +Information in a Typical Trace +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A typical trace captured (e.g. in DUMPI, OTF2, BigSim) for an +MPI program contains information on the operations that occur +at different times with critical information for the operation. +The table below gives an example of a typical trace. + +=========================== ================ ========================================================= +Time stamp, t (rounded off) Operation type Operation data (only critical information is highlighted) +=========================== ================ ========================================================= +t = 10 MPI_Bcast root, size of bcast, communicator +t = 10.5 MPI_Irecv source, tag, communicator, req ID +t = 10.51 user_computation optional region name - "boundary updates" +t = 12.51 MPI_Isend dest, tag, communicator, req ID +t = 12.53 user_computation optional region name - "core updates" +t = 22.53 MPI_Waitall req IDs +t = 25 MPI_Barrier communicator +=========================== ================ ========================================================= + +Effect of Replaying Traces +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As shown in the table below, replaying a trace can result in +different results from the original run due to different configurations +resulting in operations taking more or less time to run. + +==================== ================= =============== ============ ================ +Original time stamps Original duration New time stamps New duration Operation type +==================== ================= =============== ============ ================ +10 0.5 10 0.2 MPI_Bcast +10.5 0.01 10.2 0.01 MPI_Irecv +10.51 2 10.21 2 user_computation +12.51 0.02 12.21 0.02 MPI_Isend +12.53 10 12.23 10 user_computation +22.53 2.47 22.23 0.03 MPI_Waitall +25 1 22.26 1.7 MPI_Barrier +==================== ================= =============== ============ ================ + +In addition to the affect of the network configuration, different trace +formats may result in different results. + +As an example, DUMPI stores all the information passed to MPI calls. The +simulation then decides which request to fulfill, allowing accurate resolution +for the target systems. If the control flow of the program can change +significantly due to the ordering of operations, then simulations are not +entirely correct. + +On the other hand, OTF2 stores only the information that is used (e.g. which +request was satisfied). This accurately mimics the control flow of the trace +run, but does not accurately represent execution for the target system. + +These differences are artifacts of leveraging existing tools not originally +intended for Parallel Discrete Event Simulation (PDES). + +4. Execution ------------ - The user guide :ref:`userguide-quickstart` section shows the - arguments taken by TraceR and some of the options available - to control execution of a simulation. +The user guide :ref:`userguide-quickstart` section shows the +arguments taken by TraceR and some of the options available +to control execution of a simulation. From 181d9a1d77dcc6423cf49863d10d2cd1587a6aad Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Tue, 9 Jul 2019 15:22:22 -0700 Subject: [PATCH 11/16] Remove link from citation --- docs/tutorial.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 99009b7..5379340 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -14,9 +14,8 @@ Tutorial `Download Slides `_. **Full citation:** Nikhil Jain and Misbah Mubarak. -`CODES-TRACER Tutorial: Enabling HPC Design Space -Exploration via Discrete-Event Simulation -`_. +CODES-TRACER Tutorial: Enabling HPC Design Space +Exploration via Discrete-Event Simulation. Tutorial presented at 25th Annual Symposium on High Performance Interconnects (HOTI). Aug 28, 2017, Santa Clara, CA, USA. From 803a09b6cb717709b4c26b5d7f75398c6fccf11f Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Wed, 10 Jul 2019 09:47:41 -0700 Subject: [PATCH 12/16] Add explanation of changes to look for in the trace replay timing table --- docs/tutorial/simulation_basics.rst | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/tutorial/simulation_basics.rst b/docs/tutorial/simulation_basics.rst index 7f5908d..7129eef 100644 --- a/docs/tutorial/simulation_basics.rst +++ b/docs/tutorial/simulation_basics.rst @@ -90,7 +90,10 @@ Effect of Replaying Traces As shown in the table below, replaying a trace can result in different results from the original run due to different configurations -resulting in operations taking more or less time to run. +resulting in operations taking more or less time to run. In the first +line and last line, the MPI_Bcast and MPI_Waitall operations are faster +in the replayed trace, resulting in subsequent operations happening at +earlier times than when the trace was captured. ==================== ================= =============== ============ ================ Original time stamps Original duration New time stamps New duration Operation type @@ -107,13 +110,13 @@ Original time stamps Original duration New time stamps New duration Oper In addition to the affect of the network configuration, different trace formats may result in different results. -As an example, DUMPI stores all the information passed to MPI calls. The +As an example, DUMPI traces store all the information passed to MPI calls. The simulation then decides which request to fulfill, allowing accurate resolution for the target systems. If the control flow of the program can change significantly due to the ordering of operations, then simulations are not entirely correct. -On the other hand, OTF2 stores only the information that is used (e.g. which +On the other hand, OTF2 traces store only the information that is used (e.g. which request was satisfied). This accurately mimics the control flow of the trace run, but does not accurately represent execution for the target system. From d60960c134cd04e61e5ddf4501a85efe7719b9e8 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Wed, 10 Jul 2019 10:11:51 -0700 Subject: [PATCH 13/16] Fix description of table lines referred to --- docs/tutorial/simulation_basics.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tutorial/simulation_basics.rst b/docs/tutorial/simulation_basics.rst index 7129eef..b80e5a6 100644 --- a/docs/tutorial/simulation_basics.rst +++ b/docs/tutorial/simulation_basics.rst @@ -91,9 +91,9 @@ Effect of Replaying Traces As shown in the table below, replaying a trace can result in different results from the original run due to different configurations resulting in operations taking more or less time to run. In the first -line and last line, the MPI_Bcast and MPI_Waitall operations are faster -in the replayed trace, resulting in subsequent operations happening at -earlier times than when the trace was captured. +and 2nd to last table entries, the MPI_Bcast and MPI_Waitall operations +are faster in the replayed trace, resulting in subsequent operations +happening at earlier times than when the trace was captured. ==================== ================= =============== ============ ================ Original time stamps Original duration New time stamps New duration Operation type From 60a8c8953c5a623825247a9b16b8de20477140b2 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 6 Sep 2019 09:54:15 -0700 Subject: [PATCH 14/16] Tweak link to quickstart section of the userguide --- docs/tutorial/workflow.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tutorial/workflow.rst b/docs/tutorial/workflow.rst index c0441c5..8f56014 100644 --- a/docs/tutorial/workflow.rst +++ b/docs/tutorial/workflow.rst @@ -33,8 +33,8 @@ is used for simulating the communication on the network. Optionally, mapping files can also be provided. See :ref:`userguide-tracer-config-file`, :ref:`userguide-codes-config-file`, and :ref:`userguide-job-placement-file` in the user guide for instructions on creating the files. -4. Run the simulation using ``mpirun``. For details on options available, see the - :ref:`quickstart section of the user guide `. This command will +4. Run the simulation using ``mpirun``. For details on options available, see + :ref:`userguide-quickstart` in the user guide. This command will run a simulation in optimistic mode:: - mpirun -np

../traceR --sync=3 -- \ No newline at end of file + mpirun -np

../traceR --sync=3 -- From b8ba1ca2f476471edbe22ae30cae0093ecef3b65 Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 6 Sep 2019 10:11:48 -0700 Subject: [PATCH 15/16] Tweak syntax for the userguide-quickstart section label --- docs/tutorial/workflow.rst | 2 +- docs/userguide.rst | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/tutorial/workflow.rst b/docs/tutorial/workflow.rst index 8f56014..8ed1b24 100644 --- a/docs/tutorial/workflow.rst +++ b/docs/tutorial/workflow.rst @@ -33,7 +33,7 @@ is used for simulating the communication on the network. Optionally, mapping files can also be provided. See :ref:`userguide-tracer-config-file`, :ref:`userguide-codes-config-file`, and :ref:`userguide-job-placement-file` in the user guide for instructions on creating the files. -4. Run the simulation using ``mpirun``. For details on options available, see +4. Run the simulation using ``mpirun``. For details on options available, see the :ref:`userguide-quickstart` in the user guide. This command will run a simulation in optimistic mode:: diff --git a/docs/userguide.rst b/docs/userguide.rst index 33fd74d..92827a4 100644 --- a/docs/userguide.rst +++ b/docs/userguide.rst @@ -5,6 +5,7 @@ Below, we provide detailed instructions for how to start doing network simulations using TraceR. .. _userguide-quickstart: + Quickstart ---------- @@ -42,4 +43,4 @@ Generating Traces .. include:: userguide/score-p.rst .. _userguide-bigsim: -.. include:: userguide/bigsim.rst \ No newline at end of file +.. include:: userguide/bigsim.rst From bdc3035bfeb444f3f4f23eff77894a9b72b2e1bd Mon Sep 17 00:00:00 2001 From: Ryan Mast Date: Fri, 6 Sep 2019 11:00:28 -0700 Subject: [PATCH 16/16] Tweak the wording slightly for the link to the userguide quickstart section --- docs/tutorial/workflow.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorial/workflow.rst b/docs/tutorial/workflow.rst index 8ed1b24..902c932 100644 --- a/docs/tutorial/workflow.rst +++ b/docs/tutorial/workflow.rst @@ -34,7 +34,7 @@ is used for simulating the communication on the network. and :ref:`userguide-job-placement-file` in the user guide for instructions on creating the files. 4. Run the simulation using ``mpirun``. For details on options available, see the - :ref:`userguide-quickstart` in the user guide. This command will + :ref:`userguide-quickstart` section in the user guide. This command will run a simulation in optimistic mode:: mpirun -np

../traceR --sync=3 --