The ArangoDB test agent is intended to run long duration tests on ArangoDB clusters. During the test various 'user-like' operations are run, while the test-agent is introducing chaos.
When a failure in one of the test operations is detected, the test-agent will log the failure, accompanied with all relevant information (such as database server log files).
The test-agent will introduce the following kinds of chaos.
- Restart a server, one of each type at a time
- Kill a server, one of each type at a time
- Entire machine (with agent, dbserver & coordinator) is restarted
- Entire machine (with agent, dbserver & coordinator) is replaced (currently impossible)
- Entire machine (with dbserver & coordinator) is lost and replaced by another one
- Entire machine (with dbserver & coordinator) is added
- Entire machine (with dbserver & coordinator) is removed
- Network traffic between servers is blocked (iptables REJECT)
- Network traffic between servers is ignored (iptables DROP)
- Split brain
It should also be possible to:
- Pause introducing chaos
- Resume introducing chaos
The test agent will allow for multiple test scripts to be developed & run. The test operations covered in those scripts will include (among others):
- Create collections
- Drop collections
- Import documents
- Create documents
- Read existing documents
- Read non-existing documents
- Remove existing documents
- Remove existing documents with explicit & last revision
- Remove existing documents with explicit & non-last revision
- Remove non-existing documents
- Update existing documents
- Update existing documents with explicit & last revision
- Update existing documents with explicit & non-last revision
- Update non-existing documents
- Replace existing documents
- Replace existing documents with explicit & last revision
- Replace existing documents with explicit & non-last revision
- Replace non-existing documents
- Query documents (AQL)
- Query documents with long running query (AQL SLEEP)
- Modify documents with query (AQL)
- Modify documents with long running query (AQL SLEEP)
- Backup entire databases (export is not yet available on clusters)
- Rebalance shards
- Create graphs
- Add vertex and edge documents to existing graphs
- Traverse graphs
make docker
export IP=<your-local-IP>
docker run -it --rm -p 4200:4200 \
-v /var/run/docker.sock:/var/run/docker.sock \
arangodb/testagent --docker-host-ip=$IP
Then connect your browser to http://localhost:4200 to see the test dashboard.
To run 'machines' on multiple physical machine, you must provide the endpoints of docker daemons running on these machines. E.g.
export IP=<your-local-IP>
docker run -it --rm -p 4200:4200 \
-v /var/run/docker.sock:/var/run/docker.sock \
arangodb/testagent --docker-host-ip=$IP \
--docker-endpoint=tcp://192.168.1.1:2376 \
--docker-endpoint=tcp://192.168.1.2:2376 \
--docker-endpoint=tcp://192.168.1.3:2376
To allow for remote access to the remote docker agents you might need to
add the -H tcp://0.0.0.0:2376 --storage-driver=overlay2 to the ExecStart
line in your docker.service file.
To allow for use of TLS verified Docker (requires ArangoDB starter version > 0.13.10) service export additionally the relevant default Docker environment variable for enabling the verification before the above Docker command..
export DOCKER_TLS_VERIFIED=1
The above assumes that the relevant ca.cert, cert.pem and
key.pem reside in the default location for Docker client
certification, $HOME/.docker. If you would like to store the
certificate in a different directory, it needs to be specified
accordingly:
export DOCKER_CERT_PATH=/path/to/cert
This is the first test introduced, and the only one available in versions below 1.1.0. The test performs various operations on collections and documents within the _system databse, such as:
- Create collections
- Remove collections
- Create documents
- Import documents
- Read documents
- Update documents via API
- Update documents via AQL queries
- Replace documents
- Query documents
Actions are executed in random order.
All the tests except simple make a suite that is named complex. These tests share common actions and settings. They are contained in the complex package.
Unlike simple, complex test do not execute actions in random order. That is because those tests are intended to find bugs that might manifest only when a deployment contains certain amounts of data.
These tests perform various operations on collections and documents within a databse:
- Create database
- Create collection
- Create documents
- Read documents
- Update documents
- Drop collection
- Drop database
Each document contains a text field named payload that contains a random sequence of charactes. Its length is configurable and is limited only by disk space available. When documents are read, the random sequence is re-generated from the same seed and the result is compared to the one read from the database.
The only difference between DocColTest and OneShardTest is that the latter uses a single-sharded database.
The are 3 tests that work with graphs, that differ only in the type of graph used: CommunityGraphTest, SmartGraphTest and EnterpriseGraphTest. Graph tests perform the following actions:
- Create graph and underlying collections
- Create vertices
- Create edges
- Traverse graph
- Drop graph and collections
Both vertex and edge documents contain apayloadfield of configurable size.
--agency-size numberSet the size of the agency for the new cluster.--portSet the first port used by the test agent (first of a range of ports).--log-levelAdjust log level (debug|info|warning|error)--chaos-levelChaos level. Allowed values: 0-4. 0 = no chaos. 4 = maximum chaos. Default: 4.--arangodb-imageDocker image containingarangodb. The image must exists in the local docker host.--arango-imageDocker image containingarangod.--docker-endpointHow to reach the docker host (this option can be specified multiple times to use multiple docker hosts).--docker-host-ipIP of docker host.--docker-net-hostIf set, run all containers with--net=host. (Make sure the testagent container itself is also started with--net=host). Network chaos is not supported with host networking.--force-one-shardIf set, force one shard arangodb cluster (default: false)--replication-version-2If set, use replication version 2--return-403-on-failed-write-concernIf set, option--cluster.failed-write-concern-status-codewill not be set for DB servers. Otherwise this parameter will be set to 503. Warning: if this option is set, getting a response 403 from coordinator will be treated as a failure. (default: false)--docker-interfaceNetwork interface used to connect docker containers to (default: docker0)--report-dirDirectory in which failure reports will be created. This option can also be set with environment variableREPORT_DIR. The CLI parameter has higher priority than the envrironment variable. (default: .)--collect-metricsIf set, metrics about docker containers will be collected and saved into files. List of metrics that are collected:cpu_total_usage,cpu_usage_in_kernelmode,cpu_usage_in_usermode,system_cpu_usage,memory_usage--metrics-dirDirectory in which metrics will be stored. This option can also be set with environment variableMETRICS_DIR. The CLI parameter has higher priority than the envrironment variable. (default: .)--privilegedIf set, run all containers with--privileged--max-machinesUpper limit to the number of machines in a cluster (default: 10)enable-testEnable particular test. This option can be specified multiple times to run multiple tests simultaneously. Default: run all tests. Available tests:simple,DocColTest,OneShardTest,CommunityGraphTest,SmartGraphTest,EnterpriseGraphTest
Options starting with --simple affect only the simple test.
All options starting with --complex affect all the tests in the complex suite.
All options starting with --doc affect all the "document" tests in the complex suite(DocColTest and OneShardTest)
All options starting with --graph affect all the "graph" tests in the complex suite(CommunityGraphTest, SmartGraphTest, EnterpriseGraphTest)
--simple-max-documentsUpper limit to the number of documents created in simple test--simple-max-collectionsUpper limit to the number of collections created in simple test--simple-operation-timeoutTimeout per database operation--simple-retry-timeoutHow long are tests retried before giving up--complex-shardsNumber of shards--complex-replicationFactorReplication factor--complex-operation-timeoutTimeout per database operation--complex-retry-timeoutHow long are tests retried before giving up--complex-step-timeoutPause between test actions--doc-max-documentsUpper limit to the number of documents created in document collection tests--doc-batch-sizeNumber of documents to be created during one test step--doc-document-sizeSize of payload field in bytes--doc-max-updatesNumber of update operations to be performed on each document--graph-max-verticesUpper limit to the number of vertices--graph-vertex-sizeSize of the payload field in bytes in all vertices--graph-edge-sizeSize of the payload field in bytes in all edges--graph-traversal-opsHow many traversal operations to perform in one test step--graph-batch-sizeNumber of vertices/edges to be created in one test step