Skip to content

Commit

Permalink
Add scripts for running Linux perf.
Browse files Browse the repository at this point in the history
We also modify build-all.sh to give greater flexibility when building QEMU.

	* .gitignore: Ignore generated results.
	* build-all.sh: Add options to control building of QEMU.
	* run-spec-pop2.sh: Created.

memcpy-benchmarks/

	* .gitignore: Ignore standard results directories.
	* README.md: Updated with details of Linux perf scripts.
	* count-top-funcs.sh: Created.
	* extract-top-level-funcs.sh: Created.
	* profile-all-funcs.sh: Created.
	* run-perf.sh: Created.

Signed-off-by: Jeremy Bennett <[email protected]>
  • Loading branch information
jeremybennett committed Sep 1, 2024
1 parent 5174e91 commit ccc9dd1
Show file tree
Hide file tree
Showing 9 changed files with 853 additions and 4 deletions.
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,16 @@
# Git comparison files
*.diff
*.patch
*.orig
*.rej
# Editor backup files
*~
# Generated logs and results files
*.log
*.res
*.csv
# Generated graphics
*.png
bm-graph-all-*/
# Dump files
*.dump
50 changes: 46 additions & 4 deletions build-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,16 @@ usage () {
cat <<EOF
Usage ./build-all.sh : Build riscv64-unknown-linux-gnu
tool chain and QEMU (default)
[--only-qemu] : Build just QEMU
[--build-qemu] : Build qemu-riscv32 and qemu-riscv64
[--build-clang] : Build Clang/LLVM
[--build-gdbserver] : Build gdbserver
[--qemu-only] : Only build qemu
[--qemu-configs] : Additional QEMU config otions
[--qemu-cflags] : CFLAGS for building QEMU (default
"-Wno-error")
[--profile-qemu] : Enable profiling by gperf
[--prefix <path>] : Install path of the tool chain.
Default path is ../install
[--arch <arch>] : Target architecture. Default
architecture is rv64gc
[--abi <abi>] : Target ABI. Default ABI is lp64d
Expand All @@ -33,6 +40,7 @@ Usage ./build-all.sh : Build riscv64-unknown-linux-gnu
[--clean] : Delete build directories in
riscv-gnu-toolchain and the install
directory before building
[--clean-qemu] : Clean just the QEMU build
[--help] : Print this message and exit
EOF
}
Expand All @@ -51,9 +59,13 @@ DEFAULTTRIPLE=riscv64-unknown-elf

build_linux=true
qemu_only=false
qemu_configs=""
qemu_cflags=""
profile_qemu=""
build_gdbserver=false
build_clang=false
clean_build=false
clean_qemu_build=false
enable_multilib=true
print_help=false
print_hashes=false
Expand Down Expand Up @@ -110,6 +122,17 @@ until
--qemu-only)
qemu_only=true
;;
--qemu-configs)
shift
qemu_configs="$1"
;;
--qemu-cflags)
shift
qemu_cflags="$1"
;;
--profile-qemu)
profile_qemu="--enable-gprof"
;;
--build-gdbserver)
build_gdbserver=true
;;
Expand Down Expand Up @@ -156,6 +179,10 @@ until
;;
--clean)
clean_build=true
clean_qemu_build=true
;;
--clean-qemu)
clean_qemu_build=true
;;
--help)
print_help=true
Expand Down Expand Up @@ -267,6 +294,15 @@ else
EXTRA_OPTS="${EXTRA_OPTS} --disable-multilib"
fi
echo " build qemu: yes"
echo " qemu_configs: ${qemu_configs}"
echo " qemu_cflags: ${qemu_cflags}"
if ${clean_qemu_build}
then
echo " qemu_clean: yes"
else
echo " qemu_clean: no"
fi

if ${build_gdbserver}
then
echo " build gdbserver: yes"
Expand All @@ -283,7 +319,7 @@ fi
cd $TOPDIR/riscv-gnu-toolchain

log_file="${LOGDIR}/clean-toolchain.log"
if ${clean_build}
if ${clean_build} && ! ${qemu_only}
then
echo
echo "Cleaning... logging to ${log_file}"
Expand Down Expand Up @@ -362,8 +398,14 @@ echo "Building QEMU... logging to ${log_file}"
$TOPDIR/qemu/configure --prefix=$INSTALLDIR \
--target-list=riscv64-linux-user,riscv32-linux-user \
--interp-prefix=$INSTALLDIR/sysroot \
--python=python3 \
--extra-cflags="-Wno-error"
--python=python3 ${profile_qemu} \
${qemu_configs} \
--extra-cflags="${qemu_cflags}"
if ${clean_build} || ${clean_qemu_build}
then
rm -f ${INSTALLDIR}/bin/qemu-riscv??
make clean
fi
make -j $(nproc)
make install
) > ${log_file} 2>&1
Expand Down
9 changes: 9 additions & 0 deletions memcpy-benchmarks/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,12 @@
*.exe
*.icount
*.check
# Generated data
*.csv
*.res
perf.data
perf.data.old
gmon.out
# Standard directories for generated data
res-baseline
res-development
140 changes: 140 additions & 0 deletions memcpy-benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,143 @@ option to see arguments and the comments in the script.
The `run-sequence.sh` script will run a large number of benchmarks for
different values of VLEN and LMUL and for a range of data sizes. Again use
the `--help` option to see arguments and look at the comments in the script.

## Scripts to help with Linux _perf_

### Prerequisites

The scripts are intended to run under Linux. Prequisites are Linux _perf_ and
_csvtool_, both of which should be available with standard distributions.

### `run_perf.sh`

```
./run-perf.sh [--bytes <num>] [--resdir <dir>] [--sizes <list>]
```

Uses Linux _perf_ to profile different variants of the `memcpy` benchmark.
Arguments are as follows.

- `--bytes` _num_ : Total bytes to copy (optional). Default 1,000,000,000.

- `--resdir` _dir_ : Directory in which to place the results (optional).
Default is `res-baseline` in the directory holding this script.

- `--sizes` _list_ : Space separated list of the data sizes to use when
creating results (optional). Default list is all the powers of 2, 3, 5 and
7 up to 5<sup>6</sup>.

The results will be three sets of files of the form
`prof-`_type_`-`_size_`.res`, where `type` is one of `scalar`, `vector-small`
or `vector-large`, and _size_, is the size of the data block copied on each
iteration.

The total number of iterations for each test is determined by the number given
in the `--bytes` argument divided by the size of the data block being used for
the run.

`perf record` is run using DWARF to determine the call graph. This gives
accurate results, but is slow. Expect each iteration to take of the order of
20 minutes on a decent server.

### `extract-top-level-funcs.sh`

```
./extract-top-level-funcs.sh --resfile <file> [--cutoff <num>] \
[--total|--self] [--omit-empty] [--md | --csv]
```

Extract the main results from a file generated by `run_perf.sh`. Arguments
are as follows.

- `--resfile` _file_: Target file to extract results from (mandatory)
- `--cutoff` _num_: Percentage below which to stop showing results
(optional). Default value 1
- `--total`: Cutoff and sorting based on total time (self + children)
(optional). Set by default.
- `--self`: Cutoff and sorting just based on self time (no children)
(optional). Opposite to `--total`, so not set by default.
- `--omit-empty`: Do not show results if self is 0.00 (optional). Only has
any effect in combination with `--total`.
- `--md`: Output results in MarkDown format (optional). Set by default
- `--csv`: Output results in CSV format.

**Note.** Only one of `--total` or `--self` may be specified. Only one of
`--md` or `--csv` may be specified.

This is the central file for extracting data from the Linux _perf_ results.
In general using `--self` gives the most useful data for targeting
optimizations. Using `--total` will flag up these functions, but also
functions which are just wrappers for other functions. The `--omit-empty`
option can be helpful when using `--total` to skip functions which are purely
wrapping other functions.

### `count-top-funcs.sh`

```
Usage ./count-top-funcs.sh [--resdir <dir>] [--total|--self] [--md | --csv]
```

Find the frequency of the most used functions in a set of data. This is a
wrapper for `extract-top-level-funcs`. Arguments are as follows.

- `--resdir` _dir_: Directory with the results to be analysed (optional).
Default `res-baseline`
- `--total`: Cutoff and sorting based on total time (self + children)
(optional). Set by default.
- `--self`: Cutoff and sorting just based on self time (no children)
(optional). Opposite to `--total`, so not set by default.
- `--md`: Output results in MarkDown format (optional). Set by default
- `--csv`: Output results in CSV format.

**Note.** Only one of `--total` or `--self` may be specified. Only one of
`--md` or `--csv` may be specified.

The results to be analysed will be in files of the form
`prof-`_type_`-`_size_`.res`, where _type_ is one of `scalar`, `vector-small`
or `vector-large`, and _size_, is the size of the data block in bytes copied
on each iteration.

### `profile-all-funcs.sh`

```
./profile-all-funcs.sh [--resdir <dir>] [--type <str>] [--total|--self] \
[--funclist <list>]
```

Extract data on function usage for different data sizes in a form suitable for
graphical analysis. Arguments are as follows.

- `--resdir` _dir_: Directory with the results to be analysed (optional).
Default `res-baseline`.
- `--type` _str_: What type of result to look at (optional). Permitted values
are `scalar` (default), `vector-small` or `vector-large`.
- `--total`: Cutoff and sorting based on total time (self + children)
(optional). Set by default.
- `--self`: Cutoff and sorting just based on self time (no children)
(optional). Opposite to `--total`, so not set by default.
- `--funclist` _list_: Space separated list of functions to profile. Default
value `helper_lookup_tb_ptr cpu_get_tb_cpu_state`

**Note.** Only one of `--total` or `--self` may be specified.

This script typically takes a set of functions identified by
`count-top-funcs.sh`. The output is always CSV format.

### `run-spec-pop2.sh`

```
./run-spec-pop2.sh [--reportfile <file>] [--specdir <dir>]
```

**Note.** Because this is not specific to the `memcpy` benchmarks it lives in
the main `tooling` repository. Arguments are as follows.

- `--reportfile` _file_: Put the results in this file. Default
`prof-628.pop2_s.res` in the `tooling` repository

- `--specdir` _dir_: The directory holding the SPEC installation to be used.

This script runs the SPEC CPU 2017 benchmark under QEMU with Linux _perf_.
The script runs a previously built benchmark. If necessary use
`runspec-qemu.sh` to create the benchmark binary.
Loading

0 comments on commit ccc9dd1

Please sign in to comment.