Description
Describe the bug
Hello, I was running some benchmark on my python codebase on different hardware.
Turns out, there were massive unexplained differences up to 10 times slower in some environment, while the difference with older/newer CPU should only be about 2 or 3 times as slow.
After many many days of debugging, I've finally found the root cause. The issue is when running tests with code coverage.
Running some python tests with branch code coverage is 5 times slower, or 3 times slower without branch coverage.
To Reproduce
tests are running with the CTracer. (I saw the other bug tickets where you said to check that first :D )
CLOSE TO LATEST
python 3.11.9
pytest 7.4.4 (latest last year, there is a 8.x now)
coverage 7.2.7 (same result on 7.6.10 latest)
OLDER PYTHON FOR COMPARISON
python 3.8.12
pytest 3.10.1
coverage 5.0.3
source code
test_empty.py
is an empty test like def test_empty(): \npass
test_multiplication.py
is a math operation def test_multiplication(): 9**10_000_000
test_prime.py
is a python script below to compute prime numbers up to N.
import socket
import time
def is_prime(n):
"""Check if a number is prime."""
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0: # Exclude even numbers
return False
for i in range(3, int(n ** 0.5) + 1, 2): # Only check odd divisors
if n % i == 0:
return False
return True
def cpu_benchmark_find_primes(limit):
"""Benchmark CPU by finding primes and measuring the time taken."""
print(f"Starting CPU benchmark on {socket.getfqdn()}...")
start_time = time.time()
# Calculate all prime numbers up to a large value
primes = [n for n in range(2, limit) if is_prime(n)]
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Found {len(primes)} primes up to {limit}")
print(f"Time taken: {elapsed_time:.3f} seconds")
return elapsed_time
def test_find_primes():
cpu_benchmark_find_primes(3_000_000)
if __name__ == "__main__":
test_find_primes()
**Actual behavior **
best of 5 runs. the command is basically <coveragewithoptions> pytest <testfile>
full test command with hyperfine.
hyperfine -N --runs 3 --parameter-list wrapper "","coverage run","coverage run --branch","coverage run --omit=tests/*","coverage run --branch --omit=tests/*" --parameter-list testfile test_empty.py,test_multiplication.py,test_prime.py --parameter-list pythonversion 38,311 --show-output --export-markdown=result.md --export-csv=result.csv -- '{wrapper} /turbo/rmorotti/pyenvs/python-benchmark-{pythonversion}/bin/pytest tests/unit/{testfile} --durations=0'
** Expected behavior **
Obviously the coverage should have as little overhead as possible.
On the operation 9^N
, the overhead is very little. that's fine.
On the python code to compute prime numbers up to N, the overhead is insane, as much as 5 times with branch coverage enabled.
Even if you disable code coverage on the test file with --omit=tests/*
, the overhead is still 3 times, which makes no sense!
What could possibly explain that?
additional context
The pure pytest run without coverage is 5.695 seconds on python 3.8 vs 4.821 on python 3.11.
The runtime is massively increased with code coverage, it's actually increased to the same duration, despite being totally different versions with totally different interpreter. Which I find extremely odd!
Is it possible that there is a massive fix overhead with cover coverage?
Maybe important to know. Python 3.11+ is 10%-30% faster on most operations. Python 3.11 is 4 times faster on tight loop and list comprehension.
If code coverage has some fixed overhead around loops, that could have a massive effect on recent python versions.