Skip to content

Commit

Permalink
Document results for Parallel user session perfomance. Closes #721
Browse files Browse the repository at this point in the history
- make web_simulation_test more robust by reading TE count via RPC
  • Loading branch information
atodorov committed Feb 15, 2025
1 parent 04cf448 commit 4430a53
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 3 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
74 changes: 74 additions & 0 deletions docs/source/hardware_performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,75 @@ to transfer the actual information:
Firefox timing metrics are explained in
`Mozilla's documentation <https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor/request_details#timings_tab>`_

Parallel user sessions performance
----------------------------------

Another important question is *How many parallel users can Kiwi TCMS support?*
and the answer to this question is heavily dependent on what these users are
actually doing and how they are interacting with the application which is
vastly different between various teams and organizations.

To help answer this question we've created the ``web_simulation_test.py`` script
which uses the Playwright to simulate realistic user activity as if it was
from a browser. The script implements the top 10 most common activities such as
viewing the dashboard page, creating test plans and test cases and reporting
execution tesults in test runs. These actions and their frequency were created
using our `anonymous analytics metrics <https://kiwitcms.org/privacy/>`_!
Implementation includes random sleep and varying number of artifacts to
simulate a plausible human interaction. In all scenarios pages were left to
load and exercise their default JavaScript actions - for example search pages
will query and fully load all the results!

.. important::

The information below has been gathered by using the following environment:

- Client: AWS c6a.metal (192 CPU core) in us-east-1a (same availability zone as server)
- Server: AWS t3.medium in use-east-1a, 30GB gp3 disk, defaults to 3000 IOPS,
default throughput 125 MiB/s
- Kiwi TCMS v14.0 via ``docker compose up``
- Database is ``mariadb:11.6.2`` with a persistent volume backed onto
the host filesystem
- Host OS - Ubuntu 24.04, freshly provisioned, no changes from defaults
- ``web_simulation_test.py`` @
`87dd61f <https://github.com/kiwitcms/Kiwi/blob/87dd61ff9955e79de4604259bc29ab7a923f0730/tests/performance/web_simulation_test.py>`_
``locust --processes -1 --users 300 --spawn-rate 0.33 --run-time 60m --locustfile web_simulation_test.py``
- ~ 15 min ramp-up of all users; then steady load
- Existing state: 20 x TestPlan; 200 x TestCase; 200 x TestRun

The results we've got are:

- 300 users were served with minimum errors; < 0.01% of all requests
- Errors occured 2 mins before the end of the testing session; could also be
related to other processes in the host OS eating up available CPU
- Cloning (usually more than 1 TC) is the heaviest operation; followed by
login and creating a new TR
- Performance for individual pages must be analyzed separately
- Median response time is relatively stable
- 95th percentile response time graph contains occasional spikes
- We've seen more spikes when the ramp-up period is shorter
- RAM usage is relatively constant; stayed < 1 GiB
- CPU load is between 20-60%

|300 users t3.medium datadog|
|300 users t3.medium locust graph|
|300 users t3.medium locust graph fails|
|300 users t3.medium locust table|
|300 users t3.medium locust table fails|

.. important::

Using a vanilla ``postgres:17.2`` as the database container resulted in similar
outcome with very small differences (also remember the simulation itself contains
an element of randomness):

- 0 requests failed
- Slightly higher requests/second served on average
- Median response time for every individual request is slightly longer
as pointed out above
- Slightly more frequent spikes on the 95th percentile response time graph
- CPU load is between 40-80%

.. |t3.medium metrics| image:: ./_static/t3.medium_gp3_r100.png
.. |t3.medium locust graph| image:: ./_static/t3.medium_gp3_locust_graph.png
.. |t3.medium locust table| image:: ./_static/t3.medium_gp3_locust_table.png
Expand All @@ -213,3 +282,8 @@ to transfer the actual information:
.. |TestCase.filter metrics via Internet| image:: ./_static/TestCase.filter_metrics_via_internet.png
.. |TestRun.filter metrics| image:: ./_static/TestRun.filter_metrics.png
.. |TestRun.filter slowest info| image:: ./_static/TestRun.filter_slowest_info.png
.. |300 users t3.medium datadog| image:: ./_static/300usr_t3.medium_gp3_datadog.png
.. |300 users t3.medium locust graph| image:: ./_static/300usr_t3.medium_gp3_locust_graph.png
.. |300 users t3.medium locust graph fails| image:: ./_static/300usr_t3.medium_gp3_locust_graph_fails.png
.. |300 users t3.medium locust table| image:: ./_static/300usr_t3.medium_gp3_locust_table.png
.. |300 users t3.medium locust table fails| image:: ./_static/300usr_t3.medium_gp3_locust_table_fails.png
5 changes: 2 additions & 3 deletions tests/performance/web_simulation_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,9 @@ async def visit_test_run_page_and_update_results(self, page):
await page.goto(f"/runs/{chosen_run['id']}/")
await page.wait_for_load_state()

number_of_executions = int(
await page.locator(".test-executions-count").text_content()
number_of_executions = len(
self.json_rpc("TestExecution.filter", {"run_id": chosen_run["id"]})
)

if number_of_executions > 0:
executions = self.json_rpc(
"TestExecution.filter",
Expand Down

0 comments on commit 4430a53

Please sign in to comment.