Add scripts for running Linux perf.

We also modify build-all.sh to give greater flexibility when building QEMU. * .gitignore: Ignore generated results. * build-all.sh: Add options to control building of QEMU. * run-spec-pop2.sh: Created. memcpy-benchmarks/ * .gitignore: Ignore standard results directories. * README.md: Updated with details of Linux perf scripts. * count-top-funcs.sh: Created. * extract-top-level-funcs.sh: Created. * profile-all-funcs.sh: Created. * run-perf.sh: Created. Signed-off-by: Jeremy Bennett <[email protected]>
embecosm · Sep 1, 2024 · ccc9dd1 · ccc9dd1
1 parent 5174e91
commit ccc9dd1
Show file tree

Hide file tree

Showing 9 changed files with 853 additions and 4 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,16 @@
+# Git comparison files
+*.diff
+*.patch
+*.orig
+*.rej
 # Editor backup files
 *~
+# Generated logs and results files
+*.log
+*.res
+*.csv
+# Generated graphics
+*.png
+bm-graph-all-*/
+# Dump files
+*.dump
diff --git a/build-all.sh b/build-all.sh
@@ -15,9 +15,16 @@ usage () {
     cat <<EOF
 Usage ./build-all.sh                      : Build riscv64-unknown-linux-gnu
                                             tool chain and QEMU (default)
-                     [--only-qemu]        : Build just QEMU
+                     [--build-qemu]       : Build qemu-riscv32 and qemu-riscv64
                      [--build-clang]      : Build Clang/LLVM
                      [--build-gdbserver]  : Build gdbserver
+		     [--qemu-only]        : Only build qemu
+		     [--qemu-configs]     : Additional QEMU config otions
+		     [--qemu-cflags]      : CFLAGS for building QEMU (default
+                                            "-Wno-error")
+                     [--profile-qemu]     : Enable profiling by gperf
+                     [--prefix <path>]    : Install path of the tool chain.
+                                            Default path is ../install
                      [--arch <arch>]      : Target architecture. Default
                                             architecture is rv64gc
                      [--abi <abi>]        : Target ABI. Default ABI is lp64d
@@ -33,6 +40,7 @@ Usage ./build-all.sh                      : Build riscv64-unknown-linux-gnu
                      [--clean]            : Delete build directories in
                                             riscv-gnu-toolchain and the install
                                             directory before building
+                     [--clean-qemu]       : Clean just the QEMU build
                      [--help]             : Print this message and exit
 EOF
 }
@@ -51,9 +59,13 @@ DEFAULTTRIPLE=riscv64-unknown-elf
 
 build_linux=true
 qemu_only=false
+qemu_configs=""
+qemu_cflags=""
+profile_qemu=""
 build_gdbserver=false
 build_clang=false
 clean_build=false
+clean_qemu_build=false
 enable_multilib=true
 print_help=false
 print_hashes=false
@@ -110,6 +122,17 @@ until
       --qemu-only)
 	  qemu_only=true
 	  ;;
+      --qemu-configs)
+	  shift
+	  qemu_configs="$1"
+	  ;;
+      --qemu-cflags)
+	  shift
+	  qemu_cflags="$1"
+	  ;;
+      --profile-qemu)
+	  profile_qemu="--enable-gprof"
+	  ;;
       --build-gdbserver)
 	  build_gdbserver=true
 	  ;;
@@ -156,6 +179,10 @@ until
 	  ;;
       --clean)
 	  clean_build=true
+	  clean_qemu_build=true
+	  ;;
+      --clean-qemu)
+	  clean_qemu_build=true
 	  ;;
       --help)
 	  print_help=true
@@ -267,6 +294,15 @@ else
   EXTRA_OPTS="${EXTRA_OPTS} --disable-multilib"
 fi
 echo "  build qemu: yes"
+echo "   qemu_configs: ${qemu_configs}"
+echo "   qemu_cflags: ${qemu_cflags}"
+if ${clean_qemu_build}
+then
+   echo "   qemu_clean: yes"
+else
+   echo "   qemu_clean: no"
+fi
+
 if ${build_gdbserver}
 then
   echo "  build gdbserver: yes"
@@ -283,7 +319,7 @@ fi
 cd $TOPDIR/riscv-gnu-toolchain
 
 log_file="${LOGDIR}/clean-toolchain.log"
-if ${clean_build}
+if ${clean_build} && ! ${qemu_only}
 then
   echo
   echo "Cleaning...                            logging to ${log_file}"
@@ -362,8 +398,14 @@ echo "Building QEMU...                 logging to ${log_file}"
   $TOPDIR/qemu/configure --prefix=$INSTALLDIR \
 	  --target-list=riscv64-linux-user,riscv32-linux-user \
 	  --interp-prefix=$INSTALLDIR/sysroot \
-	  --python=python3 \
-	  --extra-cflags="-Wno-error"
+	  --python=python3 ${profile_qemu} \
+	  ${qemu_configs} \
+	  --extra-cflags="${qemu_cflags}"
+  if ${clean_build} || ${clean_qemu_build}
+  then
+      rm -f ${INSTALLDIR}/bin/qemu-riscv??
+      make clean
+  fi
   make -j $(nproc)
   make install
 ) > ${log_file} 2>&1

diff --git a/memcpy-benchmarks/.gitignore b/memcpy-benchmarks/.gitignore
@@ -3,3 +3,12 @@
 *.exe
 *.icount
 *.check
+# Generated data
+*.csv
+*.res
+perf.data
+perf.data.old
+gmon.out
+# Standard directories for generated data
+res-baseline
+res-development
diff --git a/memcpy-benchmarks/README.md b/memcpy-benchmarks/README.md
@@ -30,3 +30,143 @@ option to see arguments and the comments in the script.
 The `run-sequence.sh` script will run a large number of benchmarks for
 different values of VLEN and LMUL and for a range of data sizes.  Again use
 the `--help` option to see arguments and look at the comments in the script.
+
+## Scripts to help with Linux _perf_
+
+### Prerequisites
+
+The scripts are intended to run under Linux.  Prequisites are Linux _perf_ and
+_csvtool_, both of which should be available with standard distributions.
+
+### `run_perf.sh`
+
+```
+./run-perf.sh [--bytes <num>] [--resdir <dir>] [--sizes <list>]
+```
+
+Uses Linux _perf_ to profile different variants of the `memcpy` benchmark.
+Arguments are as follows.
+
+- `--bytes` _num_ : Total bytes to copy (optional).  Default 1,000,000,000.
+
+- `--resdir` _dir_ : Directory in which to place the results (optional).
+  Default is `res-baseline` in the directory holding this script.
+
+- `--sizes` _list_ : Space separated list of the data sizes to use when
+  creating results (optional).  Default list is all the powers of 2, 3, 5 and
+  7 up to 5<sup>6</sup>.
+
+The results will be three sets of files of the form
+`prof-`_type_`-`_size_`.res`, where `type` is one of `scalar`, `vector-small`
+or `vector-large`, and _size_, is the size of the data block copied on each
+iteration.
+
+The total number of iterations for each test is determined by the number given
+in the `--bytes` argument divided by the size of the data block being used for
+the run.
+
+`perf record` is run using DWARF to determine the call graph.  This gives
+accurate results, but is slow.  Expect each iteration to take of the order of
+20 minutes on a decent server.
+
+### `extract-top-level-funcs.sh`
+
+```
+./extract-top-level-funcs.sh --resfile <file> [--cutoff <num>] \
+    [--total|--self] [--omit-empty] [--md | --csv]
+```
+
+Extract the main results from a file generated by `run_perf.sh`.  Arguments
+are as follows.
+
+- `--resfile` _file_: Target file to extract results from (mandatory)
+- `--cutoff` _num_: Percentage below which to stop showing results
+  (optional). Default value 1
+- `--total`: Cutoff and sorting based on total time (self + children)
+  (optional). Set by default.
+- `--self`: Cutoff and sorting just based on self time (no children)
+  (optional). Opposite to `--total`, so not set by default.
+- `--omit-empty`: Do not show results if self is 0.00 (optional).  Only has
+  any effect in combination with `--total`.
+- `--md`: Output results in MarkDown format (optional). Set by default
+- `--csv`: Output results in CSV format.
+
+**Note.** Only one of `--total` or `--self` may be specified.  Only one of
+`--md` or `--csv` may be specified.
+
+This is the central file for extracting data from the Linux _perf_ results.
+In general using `--self` gives the most useful data for targeting
+optimizations. Using `--total` will flag up these functions, but also
+functions which are just wrappers for other functions.  The `--omit-empty`
+option can be helpful when using `--total` to skip functions which are purely
+wrapping other functions.
+
+### `count-top-funcs.sh`
+
+```
+Usage ./count-top-funcs.sh [--resdir <dir>] [--total|--self] [--md | --csv]
+```
+
+Find the frequency of the most used functions in a set of data.  This is a
+wrapper for `extract-top-level-funcs`.  Arguments are as follows.
+
+- `--resdir` _dir_: Directory with the results to be analysed (optional).
+  Default `res-baseline`
+- `--total`: Cutoff and sorting based on total time (self + children)
+  (optional). Set by default.
+- `--self`: Cutoff and sorting just based on self time (no children)
+  (optional). Opposite to `--total`, so not set by default.
+- `--md`: Output results in MarkDown format (optional). Set by default
+- `--csv`: Output results in CSV format.
+
+**Note.** Only one of `--total` or `--self` may be specified.  Only one of
+`--md` or `--csv` may be specified.
+
+The results to be analysed will be in files of the form
+`prof-`_type_`-`_size_`.res`, where _type_ is one of `scalar`, `vector-small`
+or `vector-large`, and _size_, is the size of the data block in bytes copied
+on each iteration.
+
+### `profile-all-funcs.sh`
+
+```
+./profile-all-funcs.sh [--resdir <dir>] [--type <str>] [--total|--self] \
+    [--funclist <list>]
+```
+
+Extract data on function usage for different data sizes in a form suitable for
+graphical analysis.  Arguments are as follows.
+
+- `--resdir` _dir_: Directory with the results to be analysed (optional).
+  Default `res-baseline`.
+- `--type` _str_: What type of result to look at (optional).  Permitted values
+  are `scalar` (default), `vector-small` or `vector-large`.
+- `--total`: Cutoff and sorting based on total time (self + children)
+  (optional). Set by default.
+- `--self`: Cutoff and sorting just based on self time (no children)
+  (optional). Opposite to `--total`, so not set by default.
+- `--funclist` _list_: Space separated list of functions to profile. Default
+  value `helper_lookup_tb_ptr cpu_get_tb_cpu_state`
+
+**Note.** Only one of `--total` or `--self` may be specified.
+
+This script typically takes a set of functions identified by
+`count-top-funcs.sh`.  The output is always CSV format.
+
+### `run-spec-pop2.sh`
+
+```
+./run-spec-pop2.sh [--reportfile <file>] [--specdir <dir>]
+```
+
+**Note.** Because this is not specific to the `memcpy` benchmarks it lives in
+the main `tooling` repository.  Arguments are as follows.
+
+- `--reportfile` _file_: Put the results in this file. Default
+  `prof-628.pop2_s.res` in the `tooling` repository
+
+- `--specdir` _dir_: The directory holding the SPEC installation to be used.
+
+This script runs the SPEC CPU 2017 benchmark under QEMU with Linux _perf_.
+The script runs a previously built benchmark.  If necessary use
+`runspec-qemu.sh` to create the benchmark binary.