feat: Pytest benchmark for comparing against other engines locally by petern48 · Pull Request #10 · apache/sedona-db

petern48 · 2025-09-02T15:58:07Z

See the new benchmarks/README.md for how to run and what the output looks like.

petern48 · 2025-09-02T15:59:38Z

Just pushing this somewhere for now. There are other benchmark library options too, so we don't need to commit to this one. I just found this very easy to setup and use.

paleolimbot

Thanks!

Just a few suggestions to get started. The real work here is writing the actual queries and I'm happy to run them however works for you!

paleolimbot · 2025-09-02T16:06:38Z

python/sedonadb/benchmarks/test_bench_base.py

+
+        # Setup tables
+        num_rows = 10000
+        create_points_query = f"CREATE TABLE points AS SELECT ST_GeomFromText('POINT(0 0)') AS geom FROM range({num_rows})"


The DBEngine subclass has this abstracted already such that you can create a table from a GeoParquet file or GeoPandas data frame. You can use the geoarrow_data fixture to write benchmarks against actual data, or you can use the sd_random_geometry() table function to generate it (Kristin's join integration tests are a great example).

Probably synthetic data makes sense here: points, segments (linestrings with a vertex count of 2), polygon, complex_linestring, complex_polygon. The number of batches could be configurable so that you can run tiny benchmarks or big benchmarks (this is what we do in Rust, too).

How exactly would you want to quanity complex vs non-complex?

It looks like you found vertices_per_linestring_range. I use the numbers 10 ("simple") and 500 ("complex") in the Rust benchmarks, which is sort of arbitrary but did the trick of weeding out predicate implementations that weren't using a prepared geometry (particularly when one side was a scalar). Totally optional!

python/sedonadb/benchmarks/test_bench_base.py

python/sedonadb/benchmarks/test_functions.py

petern48 · 2025-09-02T21:36:37Z

I've changed it so that we generate columns of random geometries points_10_000, polygons_10_000, polygons_100_000, etc. I'm not sure how 1. we should make this configurable or 2. to what extent we should make it configurable.

Like if we do different geometry types, simple / complex, and number of geometries, I feel that's a lot of dimensions. How much do we care to drill down?

Looking at the current implementation of test_st_area (which is parametrized, unlike the rest). We can group by table (dataset size, etc) and compare the engines at a more granular level.
(notice duckdb wins for one of the simpler datasets here, although sedonadb is faster for the rest and overall)
pytest --benchmark-group-by=param:table test_functions.py::TestBenchFunctions::test_st_area

or we can can just benchmark them at the function level (e.g st_buffer)
pytest --benchmark-group-by=func test_functions.py::TestBenchFunctions::test_st_buffer

paleolimbot · 2025-09-02T22:05:23Z

Awesome!

How much do we care to drill down?

Not that far! The tests are all about correctness and corner cases...here we can stick to the most common cases. Most functions shouldn't need more than one or two benchmarks (one on a simple geometry, which is a benchmark of our per-geometry overhead, and one for complex geometry, which is more a test of the underlying implementation).

I'm not sure how 1. we should make this configurable or 2. to what extent we should make it configurable.

No need to make it configurable now, but maybe rename the tables to _small and _large in case we decide to change those numbers?

petern48 · 2025-09-02T22:50:31Z

I added the separate 'simple' and 'complex' tables and removed the small sizing (using the original large size for everything). I didn't see much value from that dimension at the moment and this mimics the rust tests the most. I also don't think it makes sense to integrate with CI for now, since there might still be options other thanpytest-benchmark that are better suited for our needs. Just wanted a nice quick tool for comparing against different engines locally.

Here's an example of how it can be used locally at the moment. The below command will group the results of simple and complex separately.
pytest --benchmark-group-by=func,param:table test_functions.py::TestBenchFunctions::test_st_area

WDYT?

paleolimbot

This is a great start...thank you!

Can you add benchmarks/README.md (with the license header in a comment because Apache) with a brief description of the benchmarks and how to run them?

benchmarks/test_bench_base.py

paleolimbot

Thank you!

Kontinuation · 2025-09-05T10:03:22Z

I ran the benchmark and found that sedona-db uses all the CPU cores to run the benchmarking query, while DuckDB only uses one single core (on CPython main thread). This makes the benchmark results of sedona-db and DuckDB not directly comparable.

Have you ran into this issue before? Should we configure the benchmarked engines to force using single thread to execute the queries?

$ pytest --benchmark-group-by=param:table test_predicates.py::TestBenchPredicates::test_st_contains
============================================================================================ test session starts =============================================================================================
platform darwin -- Python 3.13.4, pytest-8.4.1, pluggy-1.6.0
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/bopeng/workspace/wherobots/sedona-db/benchmarks
plugins: anyio-4.10.0, benchmark-5.1.0
collected 2 items                                                                                                                                                                                            

test_predicates.py ..                                                                                                                                                                                  [100%]


---------------------------------------------------------------------------------------- benchmark 'table=polygons_simple': 2 tests ---------------------------------------------------------------------------------------
Name (time in ms)                                     Min                   Max                  Mean             StdDev                Median                IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_st_contains[polygons_simple-SedonaDB]       171.6143 (1.0)        197.6855 (1.0)        183.8239 (1.0)      11.2549 (1.13)       184.8224 (1.0)      19.9745 (1.70)          2;0  5.4400 (1.0)           5           1
test_st_contains[polygons_simple-DuckDB]       1,113.0932 (6.49)     1,140.0075 (5.77)     1,125.7878 (6.12)      9.9178 (1.0)      1,123.1109 (6.08)     11.7343 (1.0)           2;0  0.8883 (0.16)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

jiayuasu · 2025-09-05T13:58:55Z

Does this mean all the performance numbers we saw yesterday are wrong?

paleolimbot · 2025-09-05T14:22:14Z

Does this mean all the performance numbers we saw yesterday are wrong?

I don't debate the diagnostics here, but I would be surprised if DuckDB's Python package was configured to use one thread by default always, and that this wasn't caught for the entire lifecycle of the 1.3 release. There are a number of things we need to consider on top of yesterday's numbers including this!

petern48 · 2025-09-05T15:58:45Z

very good catch 😬

paleolimbot reviewed Sep 2, 2025

View reviewed changes

petern48 added 6 commits September 2, 2025 15:48

Initial support

37a6cc9

Create tables of geometry columns in advance and use them

b3dbc33

move benchmarks folder from python to root folder

2078c36

Parametrize st_area as an example

0263b8a

Use 'simple' and 'complex' tables instead of different sized columns

2ee15cf

Parametrize rest of functions

3057d84

petern48 force-pushed the pytest-benchmark branch from c58c4db to 3057d84 Compare September 2, 2025 22:49

petern48 changed the title ~~WIP: Pytest benchmark proposal~~ feat: Pytest benchmark for comparing against other engines locally Sep 2, 2025

petern48 requested a review from paleolimbot September 2, 2025 22:50

petern48 marked this pull request as ready for review September 2, 2025 22:50

paleolimbot reviewed Sep 3, 2025

View reviewed changes

benchmarks/test_bench_base.py Show resolved Hide resolved

petern48 added 3 commits September 3, 2025 09:20

Add licenses

987d6d9

Add benchmarks/README.md

b2304fd

Update README.md

18a9a34

paleolimbot approved these changes Sep 3, 2025

View reviewed changes

jiayuasu merged commit 8f00164 into apache:main Sep 3, 2025
2 checks passed

petern48 deleted the pytest-benchmark branch September 3, 2025 21:25

Kontinuation mentioned this pull request Sep 5, 2025

Run queries in python benchmarks using only one thread #24

Merged

Conversation

petern48 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petern48 commented Sep 2, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

petern48 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

paleolimbot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

petern48 commented Sep 2, 2025

Uh oh!

paleolimbot commented Sep 2, 2025

Uh oh!

petern48 commented Sep 2, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kontinuation commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiayuasu commented Sep 5, 2025

Uh oh!

paleolimbot commented Sep 5, 2025

Uh oh!

petern48 commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

petern48 commented Sep 2, 2025 •

edited

Loading

Kontinuation commented Sep 5, 2025 •

edited

Loading