Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update performance testing docs #3911

Merged
merged 2 commits into from
Feb 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/source/_static/t3.medium_gp2_r100.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/t3.medium_gp3_r100.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
151 changes: 128 additions & 23 deletions docs/source/hardware_performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,49 +47,78 @@ we often hear is *How many test execution results can Kiwi TCMS deal with?*

The information below has been gathered by using the following environment:

- Client: t2.small in us-east-1a (same availability zone as server)
- Server: t3.medium in use-east-1a, 30GB gp2 disk with 100 IOPS
- Kiwi TCMS v12.0 via ``docker compose up``
- Database is ``mariadb:10.10.2`` with a persistent volume backed onto
- Client: AWS t3.medium in us-east-1a (same availability zone as server)
- Server: AWS t3.medium in use-east-1a, 30GB gp3 disk, defaults to 3000 IOPS,
default throughput 125 MiB/s
- Kiwi TCMS v14.0 via ``docker compose up``
- Database is ``mariadb:11.6.2`` with a persistent volume backed onto
the host filesystem
- Host OS - Amazon Linux, freshly provisioned, no changes from defaults
- ``perf-script-ng`` version
`fd942d9 <https://github.com/kiwitcms/api-scripts/blob/fd942d9f805900473b69171d4dada6605ea37a97/perf-script-ng>`_
with ``RANGE_SIZE=100`` (called ``R`` below)
- For each invocation ``perf-script-ng`` creates new *Product*, *Version*
*Build* and *TestPlan*. Test plan contains ``R x test cases`` then
``R x test runs``, each containing the previous test cases and finally
- Host OS - Ubuntu 24.04, freshly provisioned, no changes from defaults
- ``api_write_test.py`` @
`748787a <https://github.com/kiwitcms/Kiwi/blob/748787ad37702ed4df2554330eef987ec40268b8/tests/performance/api_write_test.py>`_
with ``RANGE_SIZE=100``;
``locust --users 1 --spawn-rate 1 --run-time 60m --locustfile api_write_test.py``
- For each invocation ``api_write_test.py`` creates new *Product*, *Version*
*Build* and *TestPlan*. Test plan contains ``RANGE_SIZE x test cases`` then
``RANGE_SIZE x test runs``, each containing the previous test cases and finally
updating results for all of them. This simulates a huge test matrix against
the same test plan/product/version/build, e.g. testing on multiple different
platforms (browser versions + OS combinations for example)
- The total number of test execution results is ``R^2``
- The total number of API calls is ``10 + 3R + 2R^2``
- The total number of test execution results is ``RANGE_SIZE^2``
- The total number of API calls is ``10 + 3*RANGE_SIZE + 2*RANGE_SIZE^2``
- Single client, no other server load in parallel

For ``R=100`` we've got ``10000`` test execution results and
For ``RANGE_SIZE=100`` we've got ``10000`` test execution results and
``20310`` API calls in a single script invocation!

The average results are:
The results we've got are:

- 92000+ API calls/hour
- 45000+ test executions recorded/hour
- 25+ requests/second
- 33 ms/request (average); 73 ms/request (95%)
- 0 requests failed

- 43000 test execution results/hour
- 90000 API calls/hour
- 25 requests/second
- 40 ms/request

|t3.medium metrics|
|t3.medium locust graph|
|t3.medium locust table|

.. important::

We've experimented with an *i3.large* storage optimized instance which has a
Using a vanilla ``postgres:17.2`` as the database container resulted in worse
performance out of the box. For the same CPU/system load we saw numbers which
were only 60% of the ones reported above. Bombarding Kiwi TCMS with 2 Locust
users resulted in comparable outcome at the expense of CPU load averaging 90%
on the same hardware! This is due to several factors in the application framework:

- More rigorous constraint checking in Postgres
- Postgres is good at handling "long connections" while
MariaDB is better at handling "short connections"
- Connecting to Postgres is slower than connecting to a MariaDB
(think process vs. thread)
- Missing DB connection pooling as part of the application
framework until very recently
- Possibly Postgres performing more data analysis & optimization behind the
scenes

Aside from involving a DBA to monitor and tailor the performance of your
Postgres database to match the behavior of Kiwi TCMS there is little we can
do about it!


.. important::

In the past (v12.0) we've also experimented with an *i3.large* storage optimized instance which has a
Non-Volatile Memory Express (NVMe) SSD-backed storage optimized for low latency and
very high random I/O performance. We've had to
``mkfs.xfs /dev/nvme0n1 && mount /dev/nvme0n1 /var/lib/docker`` before starting the
containers.

While you can see that ``nvme`` disk latency is an
order of magnitude faster (< 0.1 ms) with the occasional peak from the root filesystem
the overall application performance didn't change a lot. The times for ``R=30`` improved
but the times for ``R=100`` worsened a bit.
the overall application performance didn't change a lot. The times for ``RANGE_SIZE=30`` improved
but the times for ``RANGE_SIZE=100`` worsened a bit.

|i3.large metrics|

Expand Down Expand Up @@ -175,10 +204,86 @@ to transfer the actual information:
Firefox timing metrics are explained in
`Mozilla's documentation <https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor/request_details#timings_tab>`_

.. |t3.medium metrics| image:: ./_static/t3.medium_gp2_r100.png
Parallel user sessions performance
----------------------------------

Another important question is *How many parallel users can Kiwi TCMS support?*
and the answer to this question is heavily dependent on what these users are
actually doing and how they are interacting with the application which is
vastly different between various teams and organizations.

To help answer this question we've created the ``web_simulation_test.py`` script
which uses the Playwright to simulate realistic user activity as if it was
from a browser. The script implements the top 10 most common activities such as
viewing the dashboard page, creating test plans and test cases and reporting
execution tesults in test runs. These actions and their frequency were created
using our `anonymous analytics metrics <https://kiwitcms.org/privacy/>`_!
Implementation includes random sleep and varying number of artifacts to
simulate a plausible human interaction. In all scenarios pages were left to
load and exercise their default JavaScript actions - for example search pages
will query and fully load all the results!

.. important::

The information below has been gathered by using the following environment:

- Client: AWS c6a.metal (192 CPU core) in us-east-1a (same availability zone as server)
- Server: AWS t3.medium in use-east-1a, 30GB gp3 disk, defaults to 3000 IOPS,
default throughput 125 MiB/s
- Kiwi TCMS v14.0 via ``docker compose up``
- Database is ``mariadb:11.6.2`` with a persistent volume backed onto
the host filesystem
- Host OS - Ubuntu 24.04, freshly provisioned, no changes from defaults
- ``web_simulation_test.py`` @
`87dd61f <https://github.com/kiwitcms/Kiwi/blob/87dd61ff9955e79de4604259bc29ab7a923f0730/tests/performance/web_simulation_test.py>`_
``locust --processes -1 --users 300 --spawn-rate 0.33 --run-time 60m --locustfile web_simulation_test.py``
- ~ 15 min ramp-up of all users; then steady load
- Existing state: 20 x TestPlan; 200 x TestCase; 200 x TestRun

The results we've got are:

- 300 users were served with minimum errors; < 0.01% of all requests
- Errors occured 2 mins before the end of the testing session; could also be
related to other processes in the host OS eating up available CPU
- Cloning (usually more than 1 TC) is the heaviest operation; followed by
login and creating a new TR
- Performance for individual pages must be analyzed separately
- Median response time is relatively stable
- 95th percentile response time graph contains occasional spikes
- We've seen more spikes when the ramp-up period is shorter
- RAM usage is relatively constant; stayed < 1 GiB
- CPU load is between 20-60%

|300 users t3.medium datadog|
|300 users t3.medium locust graph|
|300 users t3.medium locust graph fails|
|300 users t3.medium locust table|
|300 users t3.medium locust table fails|

.. important::

Using a vanilla ``postgres:17.2`` as the database container resulted in similar
outcome with very small differences (also remember the simulation itself contains
an element of randomness):

- 0 requests failed
- Slightly higher requests/second served on average
- Median response time for every individual request is slightly longer
as pointed out above
- Slightly more frequent spikes on the 95th percentile response time graph
- CPU load is between 40-80%

.. |t3.medium metrics| image:: ./_static/t3.medium_gp3_r100.png
.. |t3.medium locust graph| image:: ./_static/t3.medium_gp3_locust_graph.png
.. |t3.medium locust table| image:: ./_static/t3.medium_gp3_locust_table.png
.. |i3.large metrics| image:: ./_static/i3.large_nvme_r100.png
.. |TestCase.filter metrics| image:: ./_static/TestCase.filter_metrics.png
.. |TestCase.filter slowest info| image:: ./_static/TestCase.filter_slowest_info.png
.. |TestCase.filter metrics via Internet| image:: ./_static/TestCase.filter_metrics_via_internet.png
.. |TestRun.filter metrics| image:: ./_static/TestRun.filter_metrics.png
.. |TestRun.filter slowest info| image:: ./_static/TestRun.filter_slowest_info.png
.. |300 users t3.medium datadog| image:: ./_static/300usr_t3.medium_gp3_datadog.png
.. |300 users t3.medium locust graph| image:: ./_static/300usr_t3.medium_gp3_locust_graph.png
.. |300 users t3.medium locust graph fails| image:: ./_static/300usr_t3.medium_gp3_locust_graph_fails.png
.. |300 users t3.medium locust table| image:: ./_static/300usr_t3.medium_gp3_locust_table.png
.. |300 users t3.medium locust table fails| image:: ./_static/300usr_t3.medium_gp3_locust_table_fails.png
5 changes: 2 additions & 3 deletions tests/performance/web_simulation_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,9 @@ async def visit_test_run_page_and_update_results(self, page):
await page.goto(f"/runs/{chosen_run['id']}/")
await page.wait_for_load_state()

number_of_executions = int(
await page.locator(".test-executions-count").text_content()
number_of_executions = len(
self.json_rpc("TestExecution.filter", {"run_id": chosen_run["id"]})
)

if number_of_executions > 0:
executions = self.json_rpc(
"TestExecution.filter",
Expand Down
Loading