The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED

**Describe the bug**
Hello, I was running some benchmark on my python codebase on different hardware.
Turns out, there were massive unexplained differences up to 10 times slower in some environment, while the difference with older/newer CPU should only be about 2 or 3 times as slow. 
After many many days of debugging, I've finally found the root cause. The issue is when running tests with code coverage.

Running some python tests with branch code coverage is 5 times slower, or 3 times slower without branch coverage.

**To Reproduce**

tests are running with the CTracer. (I saw the other bug tickets where you said to check that first :D )

CLOSE TO LATEST
python 3.11.9
pytest 7.4.4 (latest last year, there is a 8.x now)
coverage 7.2.7 (same result on 7.6.10 latest)

OLDER PYTHON FOR COMPARISON
python 3.8.12
pytest 3.10.1
coverage 5.0.3

source code

`test_empty.py` is an empty test like `def test_empty(): \npass`
`test_multiplication.py` is a math operation `def test_multiplication(): 9**10_000_000`
`test_prime.py` is a python script below to compute prime numbers up to N.

```
import socket
import time


def is_prime(n):
    """Check if a number is prime."""
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:  # Exclude even numbers
        return False
    for i in range(3, int(n ** 0.5) + 1, 2):  # Only check odd divisors
        if n % i == 0:
            return False
    return True


def cpu_benchmark_find_primes(limit):
    """Benchmark CPU by finding primes and measuring the time taken."""
    print(f"Starting CPU benchmark on {socket.getfqdn()}...")
    start_time = time.time()

    # Calculate all prime numbers up to a large value
    primes = [n for n in range(2, limit) if is_prime(n)]

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Found {len(primes)} primes up to {limit}")
    print(f"Time taken: {elapsed_time:.3f} seconds")
    return elapsed_time


def test_find_primes():
    cpu_benchmark_find_primes(3_000_000)


if __name__ == "__main__":
    test_find_primes()
```

**Actual behavior **

best of 5 runs. the command is basically `<coveragewithoptions> pytest <testfile>`

full test command with hyperfine.
```
hyperfine -N --runs 3 --parameter-list wrapper "","coverage run","coverage run --branch","coverage run --omit=tests/*","coverage run --branch --omit=tests/*" --parameter-list testfile test_empty.py,test_multiplication.py,test_prime.py --parameter-list pythonversion 38,311 --show-output --export-markdown=result.md --export-csv=result.csv -- '{wrapper} /turbo/rmorotti/pyenvs/python-benchmark-{pythonversion}/bin/pytest tests/unit/{testfile} --durations=0'
```

![image](https://github.com/user-attachments/assets/1a1978cd-8327-46ba-8dfd-34ceb8e09c13)

![image](https://github.com/user-attachments/assets/684eb68e-2052-4e20-a83f-fa742bdba975)

** Expected behavior **

Obviously the coverage should have as little overhead as possible.
On the operation `9^N`, the overhead is very little. that's fine.

On the python code to compute prime numbers up to N, the overhead is insane, as much as 5 times with branch coverage enabled.
Even if you disable code coverage on the test file with `--omit=tests/*`, the overhead is still 3 times, which makes no sense!
What could possibly explain that?

**additional context**
The pure pytest run without coverage  is 5.695 seconds on python 3.8 vs 4.821 on python 3.11.
The runtime is massively increased with code coverage, it's actually increased to the same duration, despite being totally different versions with totally different interpreter. Which I find extremely odd!
Is it possible that there is a massive fix overhead with cover coverage?

Maybe important to know. Python 3.11+ is 10%-30% faster on most operations. Python 3.11 is 4 times faster on tight loop and list comprehension. 
If code coverage has some fixed overhead around loops, that could have a massive effect on recent python versions.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED #1916

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED #1916

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions