@@ -1660,6 +1660,44 @@ need to use these.</para>
1660
1660
</listitem >
1661
1661
</varlistentry >
1662
1662
1663
+ <varlistentry id =" opt.fair-sched" xreflabel =" --fair-sched" >
1664
+ <term >
1665
+ <option ><![CDATA[ --fair-sched=<no|yes|try> [default: no] ]]> </option >
1666
+ </term >
1667
+
1668
+ <listitem > <para >The <option >--fair-sched</option > controls the
1669
+ locking mechanism used by Valgrind to serialise thread
1670
+ execution. The locking mechanism differs in the way the threads
1671
+ are scheduled, giving a different trade-off between fairness and
1672
+ performance. For more details about the Valgrind thread
1673
+ serialisation principle and its impact on performance and thread
1674
+ scheduling, see <xref linkend =" manual-core.pthreads_perf_sched" />.
1675
+
1676
+ <itemizedlist >
1677
+ <listitem > <para >The value <option >--fair-sched=yes</option >
1678
+ activates a fair scheduling. Basically, if multiple threads are
1679
+ ready to run, the threads will be scheduled in a round robin
1680
+ fashion. This mechanism is not available on all platforms or
1681
+ linux versions. If not available,
1682
+ using <option >--fair-sched=yes</option > will cause Valgrind to
1683
+ terminate with an error.</para >
1684
+ </listitem >
1685
+
1686
+ <listitem > <para >The value <option >--fair-sched=try</option >
1687
+ activates the fair scheduling if available on the
1688
+ platform. Otherwise, it will automatically fallback
1689
+ to <option >--fair-sched=no</option >.</para >
1690
+ </listitem >
1691
+
1692
+ <listitem > <para >The value <option >--fair-sched=no</option > activates
1693
+ a scheduling mechanism which does not guarantee fairness
1694
+ between threads ready to run.</para >
1695
+ </listitem >
1696
+ </itemizedlist >
1697
+ </para ></listitem >
1698
+
1699
+ </varlistentry >
1700
+
1663
1701
<varlistentry id =" opt.kernel-variant" xreflabel =" --kernel-variant" >
1664
1702
<term >
1665
1703
<option >--kernel-variant=variant1,variant2,...</option >
@@ -1836,8 +1874,8 @@ that your program will use the native threading library, but Valgrind
1836
1874
serialises execution so that only one (kernel) thread is running at a
1837
1875
time. This approach avoids the horrible implementation problems of
1838
1876
implementing a truly multithreaded version of Valgrind, but it does
1839
- mean that threaded apps run only on one CPU, even if you have a
1840
- multiprocessor or multicore machine.</para >
1877
+ mean that threaded apps never use more than one CPU simultaneously,
1878
+ even if you have a multiprocessor or multicore machine.</para >
1841
1879
1842
1880
<para >Valgrind doesn't schedule the threads itself. It merely ensures
1843
1881
that only one thread runs at once, using a simple locking scheme. The
@@ -1860,6 +1898,86 @@ everything is shared (a thread) or nothing is shared (fork-like); partial
1860
1898
sharing will fail.
1861
1899
</para >
1862
1900
1901
+ <sect2 id =" manual-core.pthreads_perf_sched" xreflabel =" Scheduling and Multi-Thread Performance" >
1902
+ <title >Scheduling and Multi-Thread Performance</title >
1903
+
1904
+ <para >A thread executes some code only when it holds the lock. After
1905
+ executing a certain nr of instructions, the running thread will release
1906
+ the lock. All threads ready to run will compete to acquire the lock.</para >
1907
+
1908
+ <para >The option <option >--fair-sched</option > controls the locking mechanism
1909
+ used to serialise the thread execution.</para >
1910
+
1911
+ <para > The default pipe based locking
1912
+ (<option >--fair-sched=no</option >) is available on all platforms. The
1913
+ pipe based locking does not guarantee fairness between threads : it is
1914
+ very well possible that the thread that has just released the lock
1915
+ gets it back directly. When using the pipe based locking, different
1916
+ execution of the same multithreaded application might give very different
1917
+ thread scheduling.</para >
1918
+
1919
+ <para > The futex based locking is available on some platforms.
1920
+ If available, it is activated by <option >--fair-sched=yes</option > or
1921
+ <option >--fair-sched=try</option >. The futex based locking ensures
1922
+ fairness between threads : if multiple threads are ready to run, the lock
1923
+ will be given to the thread which first requested the lock. Note that a thread
1924
+ which is blocked in a system call (e.g. in a blocking read system call) has
1925
+ not (yet) requested the lock: such a thread requests the lock only after the
1926
+ system call is finished.</para >
1927
+
1928
+ <para > The fairness of the futex based locking ensures a better reproducibility
1929
+ of the thread scheduling for different executions of a multithreaded
1930
+ application. This fairness/better reproducibility is particularly
1931
+ interesting when using Helgrind or DRD.</para >
1932
+
1933
+ <para > The Valgrind thread serialisation implies that only one thread
1934
+ is running at a time. On a multiprocessor/multicore system, the
1935
+ running thread is assigned to one of the CPUs by the OS kernel
1936
+ scheduler. When a thread acquires the lock, sometimes the thread will
1937
+ be assigned to the same CPU as the thread that just released the
1938
+ lock. Sometimes, the thread will be assigned to another CPU. When
1939
+ using the pipe based locking, the thread that just acquired the lock
1940
+ will often be scheduled on the same CPU as the thread that just
1941
+ released the lock. With the futex based mechanism, the thread that
1942
+ just acquired the lock will more often be scheduled on another
1943
+ CPU. </para >
1944
+
1945
+ <para >The Valgrind thread serialisation and CPU assignment by the OS
1946
+ kernel scheduler can badly interact with the CPU frequency scaling
1947
+ available on many modern CPUs : to decrease power consumption, the
1948
+ frequency of a CPU or core is automatically decreased if the CPU/core
1949
+ has not been used recently. If the OS kernel often assigns the thread
1950
+ which just acquired the lock to another CPU/core, there is quite some
1951
+ chance that this CPU/core is currently at a low frequency. The
1952
+ frequency of this CPU will be increased after some time. However,
1953
+ during this time, the (only) running thread will have run at a low
1954
+ frequency. Once this thread has run during some time, it will release
1955
+ the lock. Another thread will acquire this lock, and might be
1956
+ scheduled again on another CPU whose clock frequency was decreased in
1957
+ the meantime.</para >
1958
+
1959
+ <para >The futex based locking causes threads to more often switch of
1960
+ CPU/core. So, if CPU frequency scaling is activated, the futex based
1961
+ locking might decrease significantly (up to 50% degradation has been
1962
+ observed) the performance of a multithreaded app running under
1963
+ Valgrind. The pipe based locking also somewhat interacts badly with
1964
+ CPU frequency scaling. Up to 10..20% performance degradation has been
1965
+ observed. </para >
1966
+
1967
+ <para >To avoid this performance degradation, you can indicate to the
1968
+ kernel that all CPUs/cores should always run at maximum clock
1969
+ speed. Depending on your linux distribution, CPU frequency scaling
1970
+ might be controlled using a graphical interface or using command line
1971
+ such as
1972
+ <computeroutput >cpufreq-selector</computeroutput > or
1973
+ <computeroutput >cpufreq-set</computeroutput >. You might also indicate to the
1974
+ OS scheduler to run a Valgrind process on a specific (fixed) CPU using the
1975
+ <computeroutput >taskset</computeroutput > command : running on a fixed
1976
+ CPU should ensure that this specific CPU keeps a high frequency clock speed.
1977
+ </para >
1978
+
1979
+ </sect2 >
1980
+
1863
1981
1864
1982
</sect1 >
1865
1983
0 commit comments