C++ Benchmark Library allows to create performance benchmarks of some code to investigate average/minimal/maximal execution time, items processing processing speed, I/O throughput. CppBenchmark library has lots of features and allows to make benchmarks for different kind of scenarios such as micro-benchmarks, benchmarks with fixtures and parameters, threads benchmarks, produsers/consummers pattern.
- Features
 - Requirements
 - How to build?
 - How to create a benchmark?
 - Benchmark examples
- Example 1: Benchmark of a function call
 - Example 2: Benchmark with cancelation
 - Example 3: Benchmark with static fixture
 - Example 4: Benchmark with dynamic fixture
 - Example 5: Benchmark with parameters
 - Example 6: Benchmark class
 - Example 7: Benchmark I/O operations
 - Example 8: Benchmark latency with auto update
 - Example 9: Benchmark latency with manual update
 - Example 10: Benchmark threads
 - Example 11: Benchmark threads with fixture
 - Example 12: Benchmark single producer, single consumer pattern
 - Example 13: Benchmark multiple producers, multiple consumers pattern
 - Example 14: Dynamic benchmarks
 
 - Command line options
 
- Cross platform (Linux, MacOS, Windows)
 - Micro-benchmarks
 - Benchmarks with static fixtures and dynamic fixtures
 - Benchmarks with parameters (single, pair, triple parameters, ranges, ranges with selectors)
 - Benchmark infinite run with cancelation
 - Benchmark items processing speed
 - Benchmark I/O throughput
 - Benchmark latency with High Dynamic Range (HDR) Histograms
 - Benchmark threads
 - Benchmark producers/consumers pattern
 - Different reporting formats: console, csv, json
 - Colored console progress and report
 
Optional:
Install gil (git links) tool
pip3 install gilgit clone https://github.com/chronoxor/CppBenchmark.git
cd CppBenchmark
gil updatecd build
./unix.shcd build
./unix.shcd build
unix.batcd build
unix.batcd build
mingw.batcd build
vs.bat- Build CppBenchmark library
 - Create a new *.cpp file
 - Insert #include "benchmark/cppbenchmark.h"
 - Add benchmark code (examples for different scenarios you can find below)
 - Insert BENCHMARK_MAIN() at the end
 - Compile the *.cpp file and link it with CppBenchmark library
 - Run it (see also possible command line options)
 
#include "benchmark/cppbenchmark.h"
#include <math.h>
// Benchmark sin() call for 5 seconds (by default).
// Make 5 attemtps (by default) and choose one with the best time result.
BENCHMARK("sin")
{
    sin(123.456);
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: sin()
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: sin()
Average time: 6 ns/op
Minimal time: 6 ns/op
Maximal time: 6 ns/op
Total time: 858.903 ms
Total operations: 130842248
Operations throughput: 152336350 ops/s
===============================================================================
#include "benchmark/cppbenchmark.h"
// Benchmark rand() call until it returns 0.
// Benchmark will print operations count required to get 'rand() == 0' case.
// Make 10 attemtps and choose one with the best time result.
BENCHMARK("rand-till-zero", Settings().Infinite().Attempts(10))
{
    if (rand() == 0)
        context.Cancel();
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: rand()-till-zero
Attempts: 10
-------------------------------------------------------------------------------
Phase: rand()-till-zero
Average time: 15 ns/op
Minimal time: 15 ns/op
Maximal time: 92 ns/op
Total time: 159.936 mcs
Total operations: 10493
Operations throughput: 65607492 ops/s
===============================================================================
Static fixture will be constructed once per each benchmark, will be the same for each attempt / operation and will be destructed at the end of the benchmark.
#include "macros.h"
#include <list>
#include <vector>
template <typename T>
class ContainerFixture
{
protected:
    T container;
    ContainerFixture()
    {
        for (int i = 0; i < 1000000; ++i)
            container.push_back(rand());
    }
};
BENCHMARK_FIXTURE(ContainerFixture<std::list<int>>, "std::list<int>.forward")
{
    for (auto it = container.begin(); it != container.end(); ++it)
        ++(*it);
}
BENCHMARK_FIXTURE(ContainerFixture<std::list<int>>, "std::list<int>.backward")
{
    for (auto it = container.rbegin(); it != container.rend(); ++it)
        ++(*it);
}
BENCHMARK_FIXTURE(ContainerFixture<std::vector<int>>, "std::vector<int>.forward")
{
    for (auto it = container.begin(); it != container.end(); ++it)
        ++(*it);
}
BENCHMARK_FIXTURE(ContainerFixture<std::vector<int>>, "std::vector<int>.backward")
{
    for (auto it = container.rbegin(); it != container.rend(); ++it)
        ++(*it);
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: std::list<int>-forward
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::list<int>-forward
Average time: 6.332 ms/op
Minimal time: 6.332 ms/op
Maximal time: 6.998 ms/op
Total time: 4.958 s
Total operations: 783
Operations throughput: 157 ops/s
===============================================================================
Benchmark: std::list<int>-backward
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::list<int>-backward
Average time: 7.883 ms/op
Minimal time: 7.883 ms/op
Maximal time: 8.196 ms/op
Total time: 4.911 s
Total operations: 623
Operations throughput: 126 ops/s
===============================================================================
Benchmark: std::vector<int>-forward
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::vector<int>-forward
Average time: 298.114 mcs/op
Minimal time: 298.114 mcs/op
Maximal time: 308.209 mcs/op
Total time: 4.852 s
Total operations: 16276
Operations throughput: 3354 ops/s
===============================================================================
Benchmark: std::vector<int>-backward
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::vector<int>-backward
Average time: 316.412 mcs/op
Minimal time: 316.412 mcs/op
Maximal time: 350.224 mcs/op
Total time: 4.869 s
Total operations: 15390
Operations throughput: 3160 ops/s
===============================================================================
Dynamic fixture can be used to prepare benchmark before each attempt with Initialize() / Cleanup() methods. You can access to the current benchmark context in dynamic fixture methods.
#include "macros.h"
#include <deque>
#include <list>
#include <vector>
template <typename T>
class ContainerFixture : public virtual CppBenchmark::Fixture
{
protected:
    T container;
    void Initialize(CppBenchmark::Context& context) override { container = T(); }
    void Cleanup(CppBenchmark::Context& context) override { container.clear(); }
};
BENCHMARK_FIXTURE(ContainerFixture<std::list<int>>, "std::list<int>.push_back")
{
    container.push_back(0);
}
BENCHMARK_FIXTURE(ContainerFixture<std::vector<int>>, "std::vector<int>.push_back")
{
    container.push_back(0);
}
BENCHMARK_FIXTURE(ContainerFixture<std::deque<int>>, "std::deque<int>.push_back")
{
    container.push_back(0);
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: std::list<int>.push_back()
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::list<int>.push_back()
Average time: 35 ns/op
Minimal time: 35 ns/op
Maximal time: 39 ns/op
Total time: 2.720 s
Total operations: 76213307
Operations throughput: 28009633 ops/s
===============================================================================
Benchmark: std::vector<int>.push_back()
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::vector<int>.push_back()
Average time: 5 ns/op
Minimal time: 5 ns/op
Maximal time: 5 ns/op
Total time: 722.837 ms
Total operations: 126890166
Operations throughput: 175544557 ops/s
===============================================================================
Benchmark: std::deque<int>.push_back()
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::deque<int>.push_back()
Average time: 12 ns/op
Minimal time: 12 ns/op
Maximal time: 12 ns/op
Total time: 1.319 s
Total operations: 105369784
Operations throughput: 79858488 ops/s
===============================================================================
Additional parameters can be provided to benchmark with settings using fluent syntax. Parameters can be single, pair or tripple, provided as a value, as a range, or with a range and selector function. Benchmark will be launched for each parameters combination.
#include "benchmark/cppbenchmark.h"
#include <algorithm>
#include <vector>
class SortFixture : public virtual CppBenchmark::Fixture
{
protected:
    std::vector<int> items;
    void Initialize(CppBenchmark::Context& context) override
    {
        items.resize(context.x());
        std::generate(items.begin(), items.end(), rand);
    }
    void Cleanup(CppBenchmark::Context& context) override
    {
        items.clear();
    }
};
BENCHMARK_FIXTURE(SortFixture, "std::sort", Settings().Param(1000000).Param(10000000))
{
    std::sort(items.begin(), items.end());
    context.metrics().AddItems(items.size());
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: std::sort
Attempts: 5
Operations: 1
-------------------------------------------------------------------------------
Phase: std::sort(1000000)
Total time: 66.976 ms
Total items: 1000000
Items throughput: 14930626 ops/s
-------------------------------------------------------------------------------
Phase: std::sort(10000000)
Total time: 644.141 ms
Total items: 10000000
Items throughput: 15524528 ops/s
===============================================================================
You can also create a benchmark by inheriting from CppBenchmark::Benchmark class and implementing Run() method. You can use AddItems() method of a benchmark context metrics to register processed items.
#include "benchmark/cppbenchmark.h"
#include <algorithm>
#include <vector>
class StdSort : public CppBenchmark::Benchmark
{
public:
    using Benchmark::Benchmark;
protected:
    std::vector<int> items;
    void Initialize(CppBenchmark::Context& context) override
    {
        items.resize(context.x());
        std::generate(items.begin(), items.end(), rand);
    }
    void Cleanup(CppBenchmark::Context& context) override
    {
        items.clear();
    }
    void Run(CppBenchmark::Context& context) override
    {
        std::sort(items.begin(), items.end());
        context.metrics().AddItems(items.size());
    }
};
BENCHMARK_CLASS(StdSort, "std::sort", Settings().Param(10000000))
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: std::sort
Attempts: 5
Operations: 1
-------------------------------------------------------------------------------
Phase: std::sort(10000000)
Total time: 648.461 ms
Total items: 10000000
Items throughput: 15421124 ops/s
===============================================================================
You can use AddBytes() method of a benchmark context metrics to register processed data.
#include "benchmark/cppbenchmark.h"
#include <array>
const int chunk_size_from = 32;
const int chunk_size_to = 4096;
// Create settings for the benchmark which will launch for each chunk size
// scaled from 32 bytes to 4096 bytes (32, 64, 128, 256, 512, 1024, 2048, 4096).
const auto settings = CppBenchmark::Settings()
    .ParamRange(
        chunk_size_from, chunk_size_to, [](int from, int to, int& result)
        {
            int r = result;
            result *= 2;
            return r;
        }
    );
class FileFixture
{
public:
    FileFixture()
    {
        // Open file for binary write
        file = fopen("fwrite.out", "wb");
    }
    ~FileFixture()
    {
        // Close file
        fclose(file);
        // Delete file
        remove("fwrite.out");
    }
protected:
    FILE* file;
    std::array<char, chunk_size_to> buffer;
};
BENCHMARK_FIXTURE(FileFixture, "fwrite", settings)
{
    fwrite(buffer.data(), sizeof(char), context.x(), file);
    context.metrics().AddBytes(context.x());
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: fwrite()
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: fwrite()(32)
Average time: 55 ns/op
Minimal time: 55 ns/op
Maximal time: 108 ns/op
Total time: 2.821 s
Total operations: 50703513
Total bytes: 1.523 GiB
Operations throughput: 17968501 ops/s
Bytes throughput: 548.363 MiB/s
-------------------------------------------------------------------------------
Phase: fwrite()(64)
Average time: 93 ns/op
Minimal time: 93 ns/op
Maximal time: 162 ns/op
Total time: 3.820 s
Total operations: 40744084
Total bytes: 2.438 GiB
Operations throughput: 10665202 ops/s
Bytes throughput: 650.975 MiB/s
-------------------------------------------------------------------------------
...
-------------------------------------------------------------------------------
Phase: fwrite()(2048)
Average time: 8.805 mcs/op
Minimal time: 8.805 mcs/op
Maximal time: 11.895 mcs/op
Total time: 3.968 s
Total operations: 450686
Total bytes: 880.252 MiB
Operations throughput: 113569 ops/s
Bytes throughput: 221.835 MiB/s
-------------------------------------------------------------------------------
Phase: fwrite()(4096)
Average time: 19.485 mcs/op
Minimal time: 19.485 mcs/op
Maximal time: 20.887 mcs/op
Total time: 4.906 s
Total operations: 251821
Total bytes: 983.692 MiB
Operations throughput: 51319 ops/s
Bytes throughput: 200.478 MiB/s
===============================================================================
#include "benchmark/cppbenchmark.h"
#include <chrono>
#include <thread>
const auto settings = CppBenchmark::Settings().Latency(1, 1000000000, 5);
BENCHMARK("sleep", settings)
{
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: sleep
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: sleep
Latency (Min): 10.014 ms/op
Latency (Max): 11.377 ms/op
Latency (Mean): 1.04928e+07
Latency (StDv): 364511
Total time: 4.985 s
Total operations: 571
Operations throughput: 114 ops/s
===============================================================================
If the benchmark is launched with --histograms=100 parameter then a file with High Dynamic Range (HDR) Histogram will be created - sleep.hdr
Finally you can use HdrHistogram Plotter in order to generate and analyze latency histogram:
#include "benchmark/cppbenchmark.h"
#include <chrono>
#include <limits>
const auto settings = CppBenchmark::Settings().Operations(10000000).Latency(1, 1000000000, 5, false);
BENCHMARK("high_resolution_clock", settings)
{
    static uint64_t minresolution = std::numeric_limits<uint64_t>::max();
    static uint64_t maxresolution = std::numeric_limits<uint64_t>::min();
    static auto latency_timestamp = std::chrono::high_resolution_clock::now();
    static auto resolution_timestamp = std::chrono::high_resolution_clock::now();
    static uint64_t count = 0;
    // Get the current timestamp
    auto current = std::chrono::high_resolution_clock::now();
    // Update operations counter
    ++count;
    // Register latency metrics
    uint64_t latency = std::chrono::duration_cast<std::chrono::nanoseconds>(current - latency_timestamp).count();
    if (latency > 0)
    {
        context.metrics().AddLatency(latency / count);
        latency_timestamp = current;
        count = 0;
    }
    // Register resolution metrics
    uint64_t resolution = std::chrono::duration_cast<std::chrono::nanoseconds>(current - resolution_timestamp).count();
    if (resolution > 0)
    {
        if (resolution < minresolution)
        {
            minresolution = resolution;
            context.metrics().SetCustom("resolution-min", minresolution);
        }
        if (resolution > maxresolution)
        {
            maxresolution = resolution;
            context.metrics().SetCustom("resolution-max", maxresolution);
        }
        resolution_timestamp = current;
    }
}Report fragment is the following:
===============================================================================
Benchmark: high_resolution_clock
Attempts: 5
Operations: 10000000
-------------------------------------------------------------------------------
Phase: high_resolution_clock
Latency (Min): 38 ns/op
Latency (Max): 1.037 ms/op
Latency (Mean): 53.0462
Latency (StDv): 1136.37
Total time: 468.924 ms
Total operations: 10000000
Operations throughput: 21325385 ops/s
Custom values:
	resolution-max: 7262968
	resolution-min: 311
===============================================================================
If the benchmark is launched with --histograms=100 parameter then a file with High Dynamic Range (HDR) Histogram will be created - clock.hdr
Finally you can use HdrHistogram Plotter in order to generate and analyze latency histogram:
#include "benchmark/cppbenchmark.h"
#include <atomic>
// Create settings for the benchmark which will launch for each
// set of threads scaled from 1 thread to 8 threads (1, 2, 4, 8).
const auto settings = CppBenchmark::Settings()
    .ThreadsRange(
        1, 8, [](int from, int to, int& result)
        {
            int r = result;
            result *= 2;
            return r;
        }
    );
BENCHMARK_THREADS("std::atomic++", settings)
{
    static std::atomic<int> counter = 0;
    counter++;
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: std::atomic++
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:1)
Average time: 19 ns/op
Minimal time: 19 ns/op
Maximal time: 20 ns/op
Total time: 2.124 s
Total operations: 111355461
Operations throughput: 52425884 ops/s
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:1).thread
Average time: 5 ns/op
Minimal time: 5 ns/op
Maximal time: 5 ns/op
Total time: 586.191 ms
Total operations: 111355461
Operations throughput: 189964343 ops/s
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:2)
Average time: 20 ns/op
Minimal time: 20 ns/op
Maximal time: 24 ns/op
Total time: 3.907 s
Total operations: 188624150
Operations throughput: 48270817 ops/s
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:2).thread
Average time: 23 ns/op
Minimal time: 23 ns/op
Maximal time: 30 ns/op
Total time: 2.179 s
Total operations: 94312075
Operations throughput: 43270119 ops/s
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:4)
Average time: 18 ns/op
Minimal time: 18 ns/op
Maximal time: 19 ns/op
Total time: 6.875 s
Total operations: 365529364
Operations throughput: 53160207 ops/s
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:4).thread
Average time: 56 ns/op
Minimal time: 56 ns/op
Maximal time: 60 ns/op
Total time: 5.142 s
Total operations: 91382341
Operations throughput: 17771705 ops/s
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:8)
Average time: 23 ns/op
Minimal time: 23 ns/op
Maximal time: 25 ns/op
Total time: 7.667 s
Total operations: 330867224
Operations throughput: 43153297 ops/s
-------------------------------------------------------------------------------
Phase: std::atomic++(threads:8).thread
Average time: 105 ns/op
Minimal time: 105 ns/op
Maximal time: 167 ns/op
Total time: 4.367 s
Total operations: 41358403
Operations throughput: 9468527 ops/s
===============================================================================
#include "benchmark/cppbenchmark.h"
#include <array>
#include <atomic>
// Create settings for the benchmark which will launch for each
// set of threads scaled from 1 thread to 8 threads (1, 2, 4, 8).
const auto settings = CppBenchmark::Settings()
    .ThreadsRange(
        1, 8, [](int from, int to, int& result)
        {
            int r = result;
            result *= 2;
            return r;
        }
    );
class Fixture1
{
protected:
    std::atomic<int> counter;
};
class Fixture2 : public virtual CppBenchmark::FixtureThreads
{
protected:
    std::array<int, 8> counter;
    void InitializeThread(CppBenchmark::ContextThreads& context) override
    {
        counter[CppBenchmark::System::CurrentThreadId() % counter.size()] = 0;
    }
    void CleanupThread(CppBenchmark::ContextThreads& context) override
    {
        // Thread cleanup code can be placed here...
    }
};
BENCHMARK_THREADS_FIXTURE(Fixture1, "Global counter", settings)
{
    counter++;
}
BENCHMARK_THREADS_FIXTURE(Fixture2, "Thread local counter", settings)
{
    counter[CppBenchmark::System::CurrentThreadId() % counter.size()]++;
}
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: Global counter
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: Global counter(threads:1).thread
Average time: 5 ns/op
Minimal time: 5 ns/op
Maximal time: 5 ns/op
Total time: 629.639 ms
Total operations: 119518816
Operations throughput: 189821077 ops/s
-------------------------------------------------------------------------------
Phase: Global counter(threads:2).thread
Average time: 18 ns/op
Minimal time: 18 ns/op
Maximal time: 24 ns/op
Total time: 1.860 s
Total operations: 101568823
Operations throughput: 54581734 ops/s
-------------------------------------------------------------------------------
Phase: Global counter(threads:4).thread
Average time: 57 ns/op
Minimal time: 57 ns/op
Maximal time: 66 ns/op
Total time: 4.552 s
Total operations: 79503346
Operations throughput: 17464897 ops/s
-------------------------------------------------------------------------------
Phase: Global counter(threads:8).thread
Average time: 103 ns/op
Minimal time: 103 ns/op
Maximal time: 143 ns/op
Total time: 4.601 s
Total operations: 44597477
Operations throughput: 9690967 ops/s
===============================================================================
Benchmark: Thread local counter
Attempts: 5
Duration: 5 seconds
-------------------------------------------------------------------------------
Phase: Thread local counter(threads:1).thread
Average time: 4 ns/op
Minimal time: 4 ns/op
Maximal time: 4 ns/op
Total time: 739.689 ms
Total operations: 166432112
Operations throughput: 225002770 ops/s
-------------------------------------------------------------------------------
Phase: Thread local counter(threads:2).thread
Average time: 9 ns/op
Minimal time: 9 ns/op
Maximal time: 10 ns/op
Total time: 1.061 s
Total operations: 113102777
Operations throughput: 106564314 ops/s
-------------------------------------------------------------------------------
Phase: Thread local counter(threads:4).thread
Average time: 20 ns/op
Minimal time: 20 ns/op
Maximal time: 21 ns/op
Total time: 1.944 s
Total operations: 94786108
Operations throughput: 48757481 ops/s
-------------------------------------------------------------------------------
Phase: Thread local counter(threads:8).thread
Average time: 25 ns/op
Minimal time: 25 ns/op
Maximal time: 39 ns/op
Total time: 1.784 s
Total operations: 71185751
Operations throughput: 39887088 ops/s
===============================================================================
#include "benchmark/cppbenchmark.h"
#include <mutex>
#include <queue>
const int items_to_produce = 10000000;
// Create settings for the benchmark which will create 1 producer and 1 consumer
// and launch producer in inifinite loop.
const auto settings = CppBenchmark::Settings().Infinite().PC(1, 1);
class MutexQueueBenchmark : public CppBenchmark::BenchmarkPC
{
public:
    using BenchmarkPC::BenchmarkPC;
protected:
    void Initialize(CppBenchmark::Context& context) override
    {
        _queue = std::queue<int>();
        _count = 0;
    }
    void Cleanup(CppBenchmark::Context& context) override
    {
        // Benchmark cleanup code can be placed here...
    }
    void InitializeProducer(CppBenchmark::ContextPC& context) override
    {
        // Producer initialize code can be placed here...
    }
    void CleanupProducer(CppBenchmark::ContextPC& context) override
    {
        // Producer cleanup code can be placed here...
    }
    void InitializeConsumer(CppBenchmark::ContextPC& context) override
    {
        // Consumer initialize code can be placed here...
    }
    void CleanupConsumer(CppBenchmark::ContextPC& context) override
    {
        // Consumer cleanup code can be placed here...
    }
    void RunProducer(CppBenchmark::ContextPC& context) override
    {
    	std::lock_guard<std::mutex> lock(_mutex);
        // Check if we need to stop production...
        if (_count >= items_to_produce) {
            _queue.push(0);
            context.StopProduce();
            return;
        }
        // Produce item
        _queue.push(++_count);
    }
    void RunConsumer(CppBenchmark::ContextPC& context) override
    {
    	std::lock_guard<std::mutex> lock(_mutex);
    	if (_queue.size() > 0) {
            // Consume item
            int value = _queue.front();
            _queue.pop();
            // Check if we need to stop consumption...
            if (value == 0) {
                context.StopConsume();
                return;
            }
        }
    }
private:
    std::mutex _mutex;
    std::queue<int> _queue;
    int _count;
};
BENCHMARK_CLASS(MutexQueueBenchmark, "std::mutex+std::queue<int>", settings)
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: std::mutex+std::queue<int>
Attempts: 5
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:1)
Total time: 652.176 ms
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:1).producer
Average time: 50 ns/op
Minimal time: 50 ns/op
Maximal time: 53 ns/op
Total time: 509.201 ms
Total operations: 10000001
Operations throughput: 19638574 ops/s
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:1).consumer
Average time: 64 ns/op
Minimal time: 64 ns/op
Maximal time: 67 ns/op
Total time: 650.805 ms
Total operations: 10124742
Operations throughput: 15557246 ops/s
===============================================================================
#include "benchmark/cppbenchmark.h"
#include <mutex>
#include <queue>
const int items_to_produce = 10000000;
// Create settings for the benchmark which will create 1/2/4/8 producers and 1/2/4/8 consumers
// and launch all producers in inifinite loop.
const auto settings = CppBenchmark::Settings()
    .Infinite()
    .PCRange(
        1, 8, [](int producers_from, int producers_to, int& producers_result)
        {
            int r = producers_result;
            producers_result *= 2;
            return r;
        },
        1, 8, [](int consumers_from, int consumers_to, int& consumers_result)
        {
            int r = consumers_result;
            consumers_result *= 2;
            return r;
        }
    );
class MutexQueueBenchmark : public CppBenchmark::BenchmarkPC
{
public:
    using BenchmarkPC::BenchmarkPC;
protected:
    void Initialize(CppBenchmark::Context& context) override
    {
        _queue = std::queue<int>();
        _count = 0;
    }
    void Cleanup(CppBenchmark::Context& context) override
    {
        // Benchmark cleanup code can be placed here...
    }
    void InitializeProducer(CppBenchmark::ContextPC& context) override
    {
        // Producer initialize code can be placed here...
    }
    void CleanupProducer(CppBenchmark::ContextPC& context) override
    {
        // Producer cleanup code can be placed here...
    }
    void InitializeConsumer(CppBenchmark::ContextPC& context) override
    {
        // Consumer initialize code can be placed here...
    }
    void CleanupConsumer(CppBenchmark::ContextPC& context) override
    {
        // Consumer cleanup code can be placed here...
    }
    void RunProducer(CppBenchmark::ContextPC& context) override
    {
    	std::lock_guard<std::mutex> lock(_mutex);
        // Check if we need to stop production...
        if (_count >= items_to_produce) {
            _queue.push(0);
            context.StopProduce();
            return;
        }
        // Produce item
        _queue.push(++_count);
    }
    void RunConsumer(CppBenchmark::ContextPC& context) override
    {
    	std::lock_guard<std::mutex> lock(_mutex);
    	if (_queue.size() > 0) {
            // Consume item
            int value = _queue.front();
            _queue.pop();
            // Check if we need to stop consumption...
            if (value == 0) {
                context.StopConsume();
                return;
            }
        }
    }
private:
    std::mutex _mutex;
    std::queue<int> _queue;
    int _count;
};
BENCHMARK_CLASS(MutexQueueBenchmark, "std::mutex+std::queue<int>", settings)
BENCHMARK_MAIN()Report fragment is the following:
===============================================================================
Benchmark: std::mutex+std::queue<int>
Attempts: 5
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:1)
Total time: 681.430 ms
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:1).producer
Average time: 42 ns/op
Minimal time: 42 ns/op
Maximal time: 120 ns/op
Total time: 427.075 ms
Total operations: 10000001
Operations throughput: 23415052 ops/s
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:1).consumer
Average time: 67 ns/op
Minimal time: 67 ns/op
Maximal time: 120 ns/op
Total time: 679.235 ms
Total operations: 10000001
Operations throughput: 14722437 ops/s
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:2)
Total time: 623.887 ms
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:2).producer
Average time: 58 ns/op
Minimal time: 58 ns/op
Maximal time: 103 ns/op
Total time: 582.786 ms
Total operations: 10000001
Operations throughput: 17158941 ops/s
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:1,consumers:2).consumer
Average time: 125 ns/op
Minimal time: 125 ns/op
Maximal time: 208 ns/op
Total time: 622.654 ms
Total operations: 4963799
Operations throughput: 7971989 ops/s
-------------------------------------------------------------------------------
...
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:8,consumers:4)
Total time: 820.237 ms
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:8,consumers:4).producer
Average time: 835 ns/op
Minimal time: 835 ns/op
Maximal time: 1.032 mcs/op
Total time: 606.745 ms
Total operations: 725823
Operations throughput: 1196256 ops/s
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:8,consumers:4).consumer
Average time: 213 ns/op
Minimal time: 213 ns/op
Maximal time: 264 ns/op
Total time: 755.649 ms
Total operations: 3543116
Operations throughput: 4688834 ops/s
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:8,consumers:8)
Total time: 824.811 ms
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:8,consumers:8).producer
Average time: 485 ns/op
Minimal time: 485 ns/op
Maximal time: 565 ns/op
Total time: 743.897 ms
Total operations: 1533043
Operations throughput: 2060824 ops/s
-------------------------------------------------------------------------------
Phase: std::mutex+std::queue<int>(producers:8,consumers:8).consumer
Average time: 489 ns/op
Minimal time: 489 ns/op
Maximal time: 648 ns/op
Total time: 676.364 ms
Total operations: 1382941
Operations throughput: 2044668 ops/s
===============================================================================
Dynamic benchmarks are usefull when you have some working program and want to benchmark some critical parts and code fragments. In this case just include cppbenchmark.h header and use BENCHCODE_SCOPE(), BENCHCODE_START(), BENCHCODE_STOP(), BENCHCODE_REPORT() macro. All of the macro are easy access to methods of the static Executor class which you may use directly as a singleton. All functionality provided for dynamic benchmarks is thread-safe synchronizied with mutex (each call will lose some ns).
#include "benchmark/cppbenchmark.h"
#include <chrono>
#include <thread>
#include <vector>
const int THREADS = 8;
void init()
{
    auto benchmark = BENCHCODE_SCOPE("Initialization");
    std::this_thread::sleep_for(std::chrono::seconds(2));
}
void calculate()
{
    auto benchmark = BENCHCODE_SCOPE("Calculate");
    for (int i = 0; i < 5; ++i) {
        auto phase1 = benchmark->StartPhase("Calculate.1");
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        phase1->StopPhase();
    }
    auto phase2 = benchmark->StartPhase("Calculate.2");
    {
        auto phase21 = benchmark->StartPhase("Calculate.2.1");
        std::this_thread::sleep_for(std::chrono::milliseconds(200));
        phase21->StopPhase();
        auto phase22 = benchmark->StartPhase("Calculate.2.2");
        std::this_thread::sleep_for(std::chrono::milliseconds(300));
        phase22->StopPhase();
    }
    phase2->StopPhase();
    for (int i = 0; i < 3; ++i) {
        auto phase3 = benchmark->StartPhase("Calculate.3");
        std::this_thread::sleep_for(std::chrono::milliseconds(400));
        phase3->StopPhase();
    }
}
void cleanup()
{
    BENCHCODE_START("Cleanup");
    std::this_thread::sleep_for(std::chrono::seconds(1));
    BENCHCODE_STOP("Cleanup");
}
int main(int argc, char** argv)
{
    // Initialization
    init();
    // Start parallel calculations
    std::vector<std::thread> threads;
    for (int i = 0; i < THREADS; ++i)
        threads.push_back(std::thread(calculate));
    // Wait for all threads
    for (auto& thread : threads)
        thread.join();
    // Cleanup
    cleanup();
    // Report benchmark results
    BENCHCODE_REPORT();
    return 0;
}Report fragment is the following:
===============================================================================
Benchmark: Initialization
Attempts: 1
Operations: 1
-------------------------------------------------------------------------------
Phase: Initialization
Total time: 2.002 s
===============================================================================
Benchmark: Calculate
Attempts: 1
Operations: 1
-------------------------------------------------------------------------------
Phase: Calculate
Total time: 2.200 s
-------------------------------------------------------------------------------
Phase: Calculate.1
Average time: 100.113 ms/op
Minimal time: 93.337 ms/op
Maximal time: 107.303 ms/op
Total time: 500.565 ms
Total operations: 5
Operations throughput: 9 ops/s
-------------------------------------------------------------------------------
Phase: Calculate.2
Total time: 499.420 ms
-------------------------------------------------------------------------------
Phase: Calculate.2.1
Total time: 199.514 ms
-------------------------------------------------------------------------------
Phase: Calculate.2.2
Total time: 299.755 ms
-------------------------------------------------------------------------------
Phase: Calculate.3
Average time: 399.920 ms/op
Minimal time: 399.726 ms/op
Maximal time: 400.365 ms/op
Total time: 1.199 s
Total operations: 3
Operations throughput: 2 ops/s
===============================================================================
Benchmark: Cleanup
Attempts: 1
Operations: 1
-------------------------------------------------------------------------------
Phase: Cleanup
Total time: 1.007 s
===============================================================================
When you create and build a benchmark you can run it with the following command line options:
- --version - Show program's version number and exit
 - -h, --help - Show this help message and exit
 - -f FILTER, --filter=FILTER - Filter benchmarks by the given regexp pattern
 - -l, --list - List all avaliable benchmarks
 - -o OUTPUT, --output=OUTPUT - Output format (console, csv, json). Default: console
 - -q, --quiet - Launch in quiet mode. No progress will be shown!
 - -r HISTOGRAMS, --histograms=HISTOGRAMS - Create High Dynamic Range (HDR) Histogram files with a given resolution. Default: 0
 


