Skip to content

Commit 236a71a

Browse files
author
philippe
committed
Document the new --fair-sched option.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12398 a5019735-40e9-0310-863c-91ae7b9d1cf9
1 parent 5786420 commit 236a71a

File tree

2 files changed

+126
-2
lines changed

2 files changed

+126
-2
lines changed

NEWS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ Release 3.8.0 (????)
2727
* The C++ demangler has been updated so as to work well with C++
2828
compiled by even the most recent g++'s.
2929

30+
* The new option --fair-sched allows to control the locking mechanism
31+
used by Valgrind. The locking mechanism influences the performance
32+
and scheduling of multithreaded applications (in particular
33+
on multiprocessor/multicore systems).
34+
3035
* ==================== FIXED BUGS ====================
3136

3237
The following bugs have been fixed or resolved. Note that "n-i-bz"
@@ -41,6 +46,7 @@ https://bugs.kde.org/show_bug.cgi?id=XXXXXX
4146
where XXXXXX is the bug number as listed below.
4247

4348
247386 make perf does not run all performance tests
49+
270006 -Valgrind scheduler unfair
4450
270796 s390x: Removed broken support for the TS insn
4551
271438 Fix configure for proper SSE4.2 detection
4652
273114 s390x: Support TR, TRE, TROO, TROT, TRTO, and TRTT instructions

docs/xml/manual-core.xml

Lines changed: 120 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1660,6 +1660,44 @@ need to use these.</para>
16601660
</listitem>
16611661
</varlistentry>
16621662

1663+
<varlistentry id="opt.fair-sched" xreflabel="--fair-sched">
1664+
<term>
1665+
<option><![CDATA[--fair-sched=<no|yes|try> [default: no] ]]></option>
1666+
</term>
1667+
1668+
<listitem> <para>The <option>--fair-sched</option> controls the
1669+
locking mechanism used by Valgrind to serialise thread
1670+
execution. The locking mechanism differs in the way the threads
1671+
are scheduled, giving a different trade-off between fairness and
1672+
performance. For more details about the Valgrind thread
1673+
serialisation principle and its impact on performance and thread
1674+
scheduling, see <xref linkend="manual-core.pthreads_perf_sched"/>.
1675+
1676+
<itemizedlist>
1677+
<listitem> <para>The value <option>--fair-sched=yes</option>
1678+
activates a fair scheduling. Basically, if multiple threads are
1679+
ready to run, the threads will be scheduled in a round robin
1680+
fashion. This mechanism is not available on all platforms or
1681+
linux versions. If not available,
1682+
using <option>--fair-sched=yes</option> will cause Valgrind to
1683+
terminate with an error.</para>
1684+
</listitem>
1685+
1686+
<listitem> <para>The value <option>--fair-sched=try</option>
1687+
activates the fair scheduling if available on the
1688+
platform. Otherwise, it will automatically fallback
1689+
to <option>--fair-sched=no</option>.</para>
1690+
</listitem>
1691+
1692+
<listitem> <para>The value <option>--fair-sched=no</option> activates
1693+
a scheduling mechanism which does not guarantee fairness
1694+
between threads ready to run.</para>
1695+
</listitem>
1696+
</itemizedlist>
1697+
</para></listitem>
1698+
1699+
</varlistentry>
1700+
16631701
<varlistentry id="opt.kernel-variant" xreflabel="--kernel-variant">
16641702
<term>
16651703
<option>--kernel-variant=variant1,variant2,...</option>
@@ -1836,8 +1874,8 @@ that your program will use the native threading library, but Valgrind
18361874
serialises execution so that only one (kernel) thread is running at a
18371875
time. This approach avoids the horrible implementation problems of
18381876
implementing a truly multithreaded version of Valgrind, but it does
1839-
mean that threaded apps run only on one CPU, even if you have a
1840-
multiprocessor or multicore machine.</para>
1877+
mean that threaded apps never use more than one CPU simultaneously,
1878+
even if you have a multiprocessor or multicore machine.</para>
18411879

18421880
<para>Valgrind doesn't schedule the threads itself. It merely ensures
18431881
that only one thread runs at once, using a simple locking scheme. The
@@ -1860,6 +1898,86 @@ everything is shared (a thread) or nothing is shared (fork-like); partial
18601898
sharing will fail.
18611899
</para>
18621900

1901+
<sect2 id="manual-core.pthreads_perf_sched" xreflabel="Scheduling and Multi-Thread Performance">
1902+
<title>Scheduling and Multi-Thread Performance</title>
1903+
1904+
<para>A thread executes some code only when it holds the lock. After
1905+
executing a certain nr of instructions, the running thread will release
1906+
the lock. All threads ready to run will compete to acquire the lock.</para>
1907+
1908+
<para>The option <option>--fair-sched</option> controls the locking mechanism
1909+
used to serialise the thread execution.</para>
1910+
1911+
<para> The default pipe based locking
1912+
(<option>--fair-sched=no</option>) is available on all platforms. The
1913+
pipe based locking does not guarantee fairness between threads : it is
1914+
very well possible that the thread that has just released the lock
1915+
gets it back directly. When using the pipe based locking, different
1916+
execution of the same multithreaded application might give very different
1917+
thread scheduling.</para>
1918+
1919+
<para> The futex based locking is available on some platforms.
1920+
If available, it is activated by <option>--fair-sched=yes</option> or
1921+
<option>--fair-sched=try</option>. The futex based locking ensures
1922+
fairness between threads : if multiple threads are ready to run, the lock
1923+
will be given to the thread which first requested the lock. Note that a thread
1924+
which is blocked in a system call (e.g. in a blocking read system call) has
1925+
not (yet) requested the lock: such a thread requests the lock only after the
1926+
system call is finished.</para>
1927+
1928+
<para> The fairness of the futex based locking ensures a better reproducibility
1929+
of the thread scheduling for different executions of a multithreaded
1930+
application. This fairness/better reproducibility is particularly
1931+
interesting when using Helgrind or DRD.</para>
1932+
1933+
<para> The Valgrind thread serialisation implies that only one thread
1934+
is running at a time. On a multiprocessor/multicore system, the
1935+
running thread is assigned to one of the CPUs by the OS kernel
1936+
scheduler. When a thread acquires the lock, sometimes the thread will
1937+
be assigned to the same CPU as the thread that just released the
1938+
lock. Sometimes, the thread will be assigned to another CPU. When
1939+
using the pipe based locking, the thread that just acquired the lock
1940+
will often be scheduled on the same CPU as the thread that just
1941+
released the lock. With the futex based mechanism, the thread that
1942+
just acquired the lock will more often be scheduled on another
1943+
CPU. </para>
1944+
1945+
<para>The Valgrind thread serialisation and CPU assignment by the OS
1946+
kernel scheduler can badly interact with the CPU frequency scaling
1947+
available on many modern CPUs : to decrease power consumption, the
1948+
frequency of a CPU or core is automatically decreased if the CPU/core
1949+
has not been used recently. If the OS kernel often assigns the thread
1950+
which just acquired the lock to another CPU/core, there is quite some
1951+
chance that this CPU/core is currently at a low frequency. The
1952+
frequency of this CPU will be increased after some time. However,
1953+
during this time, the (only) running thread will have run at a low
1954+
frequency. Once this thread has run during some time, it will release
1955+
the lock. Another thread will acquire this lock, and might be
1956+
scheduled again on another CPU whose clock frequency was decreased in
1957+
the meantime.</para>
1958+
1959+
<para>The futex based locking causes threads to more often switch of
1960+
CPU/core. So, if CPU frequency scaling is activated, the futex based
1961+
locking might decrease significantly (up to 50% degradation has been
1962+
observed) the performance of a multithreaded app running under
1963+
Valgrind. The pipe based locking also somewhat interacts badly with
1964+
CPU frequency scaling. Up to 10..20% performance degradation has been
1965+
observed. </para>
1966+
1967+
<para>To avoid this performance degradation, you can indicate to the
1968+
kernel that all CPUs/cores should always run at maximum clock
1969+
speed. Depending on your linux distribution, CPU frequency scaling
1970+
might be controlled using a graphical interface or using command line
1971+
such as
1972+
<computeroutput>cpufreq-selector</computeroutput> or
1973+
<computeroutput>cpufreq-set</computeroutput>. You might also indicate to the
1974+
OS scheduler to run a Valgrind process on a specific (fixed) CPU using the
1975+
<computeroutput>taskset</computeroutput> command : running on a fixed
1976+
CPU should ensure that this specific CPU keeps a high frequency clock speed.
1977+
</para>
1978+
1979+
</sect2>
1980+
18631981

18641982
</sect1>
18651983

0 commit comments

Comments
 (0)