This project is for benchmarking different S3 workloads using various languages and S3 clients.
This project is under active development and subject to change.
- To start:
- Python 3.9+ with pip
- On Amazon Linux 2023, a script is provided to install further tools.
Otherwise, depending on the language you want to benchmark, you'll need:
- CMake 3.22+
- C99 / C++20 compiler (e.g. gcc, clang)
- JDK17+ (e.g. corretto, openjdk)
- Maven
- Python C extension headers and libraries (e.g. python3-devel)
To benchmark ALL the workloads, your machine needs 300+ GiB of disk space available, and fast enough internet to upload a terabyte to S3 within your lifetime. But if you're only running 1 workload, you'll upload fewer files and use less disk space.
Your machine must have AWS credentials, with permission to read and write to an S3 bucket.
First, clone this repo.
Then install the requirements listed above. On Amazon Linux 2023, you can simply run this script:
./aws-crt-s3-benchmarks/scripts/install-tools-AL2023.py
Then, install packages needed by the python scripts:
python3 -m pip install -r aws-crt-s3-benchmarks/scripts/requirements.txt
Next, run scripts/prep-s3-files.py
. This script creates and configures
an S3 bucket, put files in S3 for benchmarks to download,
and create files on disk for benchmarks to upload:
usage: prep-s3-files.py [-h] --bucket BUCKET --region REGION --files-dir FILES_DIR
[--workloads WORKLOADS [WORKLOADS ...]]
Create files (on disk, and in S3 bucket) needed to run the benchmarks
optional arguments:
-h, --help show this help message and exit
--bucket BUCKET S3 bucket (will be created if necessary)
--region REGION AWS region (e.g. us-west-2)
--files-dir FILES_DIR
Root directory for files to upload and download (e.g. ~/files)
--workloads WORKLOADS [WORKLOADS ...]
Path to specific workload.run.json file. If not specified,
everything in workloads/ is prepared (uploading
100+ GiB to S3 and creating 100+ GiB on disk).
This script can be run repeatedly. It skips unnecessary work (e.g. won't upload a file that already exists).
Here are the IDs used for various S3 Clients, and the runner you must build to benchmark them:
S3_CLIENT | Actual S3 Client Used | Language | Benchmark Runner |
---|---|---|---|
crt-c |
aws-c-s3 | c |
runners/s3-benchrunner-c |
crt-python |
aws-crt-python | python |
runners/s3-benchrunner-python |
boto3-crt |
boto3 using CRT | python |
runners/s3-benchrunner-python |
boto3-classic |
boto3 with pure-python transfer manager | python |
runners/s3-benchrunner-python |
cli-crt |
AWS CLI v2 using CRT | python |
runners/s3-benchrunner-python |
cli-classic |
AWS CLI v2 with pure-python transfer manager | python |
runners/s3-benchrunner-python |
crt-java |
aws-crt-java | java |
runners/s3-benchrunner-java |
sdk-java-client-crt |
aws-sdk-java-v2 with CRT based S3AsyncClient | java |
runners/s3-benchrunner-java |
sdk-java-client-classic |
aws-sdk-java-v2 with pure-java S3AsyncClient | java |
runners/s3-benchrunner-java |
sdk-java-tm-crt |
aws-sdk-java-v2 with CRT based S3TransferManager | java |
runners/s3-benchrunner-java |
sdk-java-tm-classic |
aws-sdk-java-v2 with pure-java S3TransferManager | java |
runners/s3-benchrunner-java |
sdk-cpp-client-crt |
aws-sdk-cpp with S3CrtClient | cpp |
runners/s3-benchrunner-cpp |
sdk-cpp-client-classic |
aws-sdk-cpp with (non-CRT) S3Client | cpp |
runners/s3-benchrunner-cpp |
sdk-cpp-tm-classic |
aws-sdk-cpp with (non-CRT) TransferManager | cpp |
runners/s3-benchrunner-cpp |
sdk-rust-tm |
aws-s3-transfer-manager-rs | rust |
runners/s3-benchrunner-rust |
You must build a "runner" for the S3 client you'll be benchmarking. For example, build runners/s3-benchrunner-python to benchmark aws-crt-python, boto3, or AWS CLI.
Run scripts/build-runner.py
:
usage: build-runner.py [-h] --lang {c,python,java} --build-dir BUILD_DIR [--branch BRANCH]
Build a runner and its dependencies
optional arguments:
-h, --help show this help message and exit
--lang {c,python,java}
Build s3-benchrunner-<lang>
--build-dir BUILD_DIR
Root dir for build artifacts
--branch BRANCH Git branch/commit/tag to use when pulling dependencies
The last line of output from build-runner.py
displays the RUNNER_CMD
you'll need in the next step.
NOTE: Each runner has a README.md
with more advanced instructions.
build-runner.py
isn't meant to handle advanced use cases like tweaking dependencies,
iterating locally, DEBUG builds, etc.
All runners have the same command line interface, and expect to be run from the
FILES_DIR
you passed to the prep-s3-files.py script.
cd FILES_DIR
RUNNER_CMD S3_CLIENT WORKLOAD BUCKET REGION TARGET_THROUGHPUT
S3_CLIENT
: ID of S3 client to use (See table above)RUNNER_CMD
: Command to launch runner (e.g. java -jar path/to/runner.jar) This is the last line printed bybuild-runner.py
in the previous step.WORKLOAD
: Path to workload.run.json
file (see: workloads/)BUCKET
: S3 bucket name (e.g. my-test-bucket)REGION
: AWS Region (e.g. us-west-2)TARGET_THROUGHPUT
: Target throughput, in gigabits per second. Floating point allowed. Enter the EC2 type's "Network Bandwidth (Gbps)" (e.g. "100.0" for c5n.18xlarge)
Most runners should search for AWS credentials something like this.
If you want to run multiple workloads (or ALL workloads) in one go, use this helper script: run-benchmarks.py.
See workloads/
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.