kiwitcms · atodorov · Feb 15, 2025 · Feb 13, 2025 · Feb 13, 2025
diff --git a/docs/source/_static/300usr_t3.medium_gp3_datadog.png b/docs/source/_static/300usr_t3.medium_gp3_datadog.png
diff --git a/docs/source/_static/300usr_t3.medium_gp3_locust_graph.png b/docs/source/_static/300usr_t3.medium_gp3_locust_graph.png
diff --git a/docs/source/_static/300usr_t3.medium_gp3_locust_graph_fails.png b/docs/source/_static/300usr_t3.medium_gp3_locust_graph_fails.png
diff --git a/docs/source/_static/300usr_t3.medium_gp3_locust_table.png b/docs/source/_static/300usr_t3.medium_gp3_locust_table.png
diff --git a/docs/source/_static/300usr_t3.medium_gp3_locust_table_fails.png b/docs/source/_static/300usr_t3.medium_gp3_locust_table_fails.png
diff --git a/docs/source/_static/t3.medium_gp2_r100.png b/docs/source/_static/t3.medium_gp2_r100.png
diff --git a/docs/source/_static/t3.medium_gp3_locust_graph.png b/docs/source/_static/t3.medium_gp3_locust_graph.png
diff --git a/docs/source/_static/t3.medium_gp3_locust_table.png b/docs/source/_static/t3.medium_gp3_locust_table.png
diff --git a/docs/source/_static/t3.medium_gp3_r100.png b/docs/source/_static/t3.medium_gp3_r100.png
diff --git a/docs/source/hardware_performance.rst b/docs/source/hardware_performance.rst
@@ -47,49 +47,78 @@ we often hear is *How many test execution results can Kiwi TCMS deal with?*
 
     The information below has been gathered by using the following environment:
 
-    - Client: t2.small in us-east-1a (same availability zone as server)
-    - Server: t3.medium in use-east-1a, 30GB gp2 disk with 100 IOPS
-    - Kiwi TCMS v12.0 via ``docker compose up``
-    - Database is ``mariadb:10.10.2`` with a persistent volume backed onto
+    - Client: AWS t3.medium in us-east-1a (same availability zone as server)
+    - Server: AWS t3.medium in use-east-1a, 30GB gp3 disk, defaults to 3000 IOPS,
+      default throughput 125 MiB/s
+    - Kiwi TCMS v14.0 via ``docker compose up``
+    - Database is ``mariadb:11.6.2`` with a persistent volume backed onto
       the host filesystem
-    - Host OS - Amazon Linux, freshly provisioned, no changes from defaults
-    - ``perf-script-ng`` version
-      `fd942d9 <https://github.com/kiwitcms/api-scripts/blob/fd942d9f805900473b69171d4dada6605ea37a97/perf-script-ng>`_
-      with ``RANGE_SIZE=100`` (called ``R`` below)
-    - For each invocation ``perf-script-ng`` creates new *Product*, *Version*
-      *Build* and *TestPlan*. Test plan contains ``R x test cases`` then
-      ``R x test runs``, each containing the previous test cases and finally
+    - Host OS - Ubuntu 24.04, freshly provisioned, no changes from defaults
+    - ``api_write_test.py`` @
+      `748787a <https://github.com/kiwitcms/Kiwi/blob/748787ad37702ed4df2554330eef987ec40268b8/tests/performance/api_write_test.py>`_
+      with ``RANGE_SIZE=100``;
+      ``locust --users 1 --spawn-rate 1 --run-time 60m --locustfile api_write_test.py``
+    - For each invocation ``api_write_test.py`` creates new *Product*, *Version*
+      *Build* and *TestPlan*. Test plan contains ``RANGE_SIZE x test cases`` then
+      ``RANGE_SIZE x test runs``, each containing the previous test cases and finally
       updating results for all of them. This simulates a huge test matrix against
       the same test plan/product/version/build, e.g. testing on multiple different
       platforms (browser versions + OS combinations for example)
-    - The total number of test execution results is ``R^2``
-    - The total number of API calls is ``10 + 3R + 2R^2``
+    - The total number of test execution results is ``RANGE_SIZE^2``
+    - The total number of API calls is ``10 + 3*RANGE_SIZE + 2*RANGE_SIZE^2``
     - Single client, no other server load in parallel
 
-    For ``R=100`` we've got ``10000`` test execution results and
+    For ``RANGE_SIZE=100`` we've got ``10000`` test execution results and
     ``20310`` API calls in a single script invocation!
 
-The average results are:
+The results we've got are:
+
+- 92000+ API calls/hour
+- 45000+ test executions recorded/hour
+- 25+ requests/second
+- 33 ms/request (average); 73 ms/request (95%)
+- 0 requests failed
 
-- 43000 test execution results/hour
-- 90000 API calls/hour
-- 25 requests/second
-- 40 ms/request
 
 |t3.medium metrics|
+|t3.medium locust graph|
+|t3.medium locust table|
 
 .. important::
 
-    We've experimented with an *i3.large* storage optimized instance which has a
+    Using a vanilla ``postgres:17.2`` as the database container resulted in worse
+    performance out of the box. For the same CPU/system load we saw numbers which
+    were only 60% of the ones reported above. Bombarding Kiwi TCMS with 2 Locust
+    users resulted in comparable outcome at the expense of CPU load averaging 90%
+    on the same hardware! This is due to several factors in the application framework:
+
+    - More rigorous constraint checking in Postgres
+    - Postgres is good at handling "long connections" while
+      MariaDB is better at handling "short connections"
+    - Connecting to Postgres is slower than connecting to a MariaDB
+      (think process vs. thread)
+    - Missing DB connection pooling as part of the application
+      framework until very recently
+    - Possibly Postgres performing more data analysis & optimization behind the
+      scenes
+
+    Aside from involving a DBA to monitor and tailor the performance of your
+    Postgres database to match the behavior of Kiwi TCMS there is little we can
+    do about it!
+
+
+.. important::
+
+    In the past (v12.0) we've also experimented with an *i3.large* storage optimized instance which has a
     Non-Volatile Memory Express (NVMe) SSD-backed storage optimized for low latency and
     very high random I/O performance. We've had to
     ``mkfs.xfs /dev/nvme0n1 && mount /dev/nvme0n1 /var/lib/docker`` before starting the
     containers.
 
     While you can see that ``nvme`` disk latency is an
     order of magnitude faster (< 0.1 ms) with the occasional peak from the root filesystem
-    the overall application performance didn't change a lot. The times for ``R=30`` improved
-    but the times for ``R=100`` worsened a bit.
+    the overall application performance didn't change a lot. The times for ``RANGE_SIZE=30`` improved
+    but the times for ``RANGE_SIZE=100`` worsened a bit.
 
     |i3.large metrics|
 
@@ -175,10 +204,86 @@ to transfer the actual information:
     Firefox timing metrics are explained in
     `Mozilla's documentation <https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor/request_details#timings_tab>`_
 
-.. |t3.medium metrics| image:: ./_static/t3.medium_gp2_r100.png
+Parallel user sessions performance
+----------------------------------
+
+Another important question is *How many parallel users can Kiwi TCMS support?*
+and the answer to this question is heavily dependent on what these users are
+actually doing and how they are interacting with the application which is
+vastly different between various teams and organizations.
+
+To help answer this question we've created the ``web_simulation_test.py`` script
+which uses the Playwright to simulate realistic user activity as if it was
+from a browser. The script implements the top 10 most common activities such as
+viewing the dashboard page, creating test plans and test cases and reporting
+execution tesults in test runs. These actions and their frequency were created
+using our `anonymous analytics metrics <https://kiwitcms.org/privacy/>`_!
+Implementation includes random sleep and varying number of artifacts to
+simulate a plausible human interaction. In all scenarios pages were left to
+load and exercise their default JavaScript actions - for example search pages
+will query and fully load all the results!
+
+.. important::
+
+    The information below has been gathered by using the following environment:
+
+    - Client: AWS c6a.metal (192 CPU core) in us-east-1a (same availability zone as server)
+    - Server: AWS t3.medium in use-east-1a, 30GB gp3 disk, defaults to 3000 IOPS,
+      default throughput 125 MiB/s
+    - Kiwi TCMS v14.0 via ``docker compose up``
+    - Database is ``mariadb:11.6.2`` with a persistent volume backed onto
+      the host filesystem
+    - Host OS - Ubuntu 24.04, freshly provisioned, no changes from defaults
+    - ``web_simulation_test.py`` @
+      `87dd61f <https://github.com/kiwitcms/Kiwi/blob/87dd61ff9955e79de4604259bc29ab7a923f0730/tests/performance/web_simulation_test.py>`_
+      ``locust --processes -1 --users 300 --spawn-rate 0.33 --run-time 60m --locustfile web_simulation_test.py``
+    - ~ 15 min ramp-up of all users; then steady load
+    - Existing state: 20 x TestPlan; 200 x TestCase; 200 x TestRun
+
+The results we've got are:
+
+- 300 users were served with minimum errors; < 0.01% of all requests
+- Errors occured 2 mins before the end of the testing session; could also be
+  related to other processes in the host OS eating up available CPU
+- Cloning (usually more than 1 TC) is the heaviest operation; followed by
+  login and creating a new TR
+- Performance for individual pages must be analyzed separately
+- Median response time is relatively stable
+- 95th percentile response time graph contains occasional spikes
+- We've seen more spikes when the ramp-up period is shorter
+- RAM usage is relatively constant; stayed < 1 GiB
+- CPU load is between 20-60%
+
+|300 users t3.medium datadog|
+|300 users t3.medium locust graph|
+|300 users t3.medium locust graph fails|
+|300 users t3.medium locust table|
+|300 users t3.medium locust table fails|
+
+.. important::
+
+    Using a vanilla ``postgres:17.2`` as the database container resulted in similar
+    outcome with very small differences (also remember the simulation itself contains
+    an element of randomness):
+
+    - 0 requests failed
+    - Slightly higher requests/second served on average
+    - Median response time for every individual request is slightly longer
+      as pointed out above
+    - Slightly more frequent spikes on the 95th percentile response time graph
+    - CPU load is between 40-80%
+
+.. |t3.medium metrics| image:: ./_static/t3.medium_gp3_r100.png
+.. |t3.medium locust graph| image:: ./_static/t3.medium_gp3_locust_graph.png
+.. |t3.medium locust table| image:: ./_static/t3.medium_gp3_locust_table.png
 .. |i3.large metrics| image:: ./_static/i3.large_nvme_r100.png
 .. |TestCase.filter metrics| image:: ./_static/TestCase.filter_metrics.png
 .. |TestCase.filter slowest info| image:: ./_static/TestCase.filter_slowest_info.png
 .. |TestCase.filter metrics via Internet| image:: ./_static/TestCase.filter_metrics_via_internet.png
 .. |TestRun.filter metrics| image:: ./_static/TestRun.filter_metrics.png
 .. |TestRun.filter slowest info| image:: ./_static/TestRun.filter_slowest_info.png
+.. |300 users t3.medium datadog| image:: ./_static/300usr_t3.medium_gp3_datadog.png
+.. |300 users t3.medium locust graph| image:: ./_static/300usr_t3.medium_gp3_locust_graph.png
+.. |300 users t3.medium locust graph fails| image:: ./_static/300usr_t3.medium_gp3_locust_graph_fails.png
+.. |300 users t3.medium locust table| image:: ./_static/300usr_t3.medium_gp3_locust_table.png
+.. |300 users t3.medium locust table fails| image:: ./_static/300usr_t3.medium_gp3_locust_table_fails.png
diff --git a/tests/performance/web_simulation_test.py b/tests/performance/web_simulation_test.py
@@ -79,10 +79,9 @@ async def visit_test_run_page_and_update_results(self, page):
         await page.goto(f"/runs/{chosen_run['id']}/")
         await page.wait_for_load_state()
 
-        number_of_executions = int(
-            await page.locator(".test-executions-count").text_content()
+        number_of_executions = len(
+            self.json_rpc("TestExecution.filter", {"run_id": chosen_run["id"]})
         )
-
         if number_of_executions > 0:
             executions = self.json_rpc(
                 "TestExecution.filter",