Hive-related things that don't belong in the source repo and aren't SOPs.
The monitoring subdirectory contains code that watches cloud resources, e.g. to make sure we're not leaving things around that cost us money unnecessarily.
The monitoring/aws subdirectory contains code that uses the AWS Lambda service to monitor our Hive Team Cluster. There are currently two setups:
periodicHiveLambdaFunction
monitors instance usage and emails a report daily at 6pm ET.ciMonitorLambdaFunction
looks for leaked resources from CI jobs and emails a report iff any are found. We consider it a leak if any CI cluster (named with the prefixhiveci-
) is more than 4h old.
Prerequisites:
- Ansible CLI, e.g.
sudo yum install ansible
- Python3
- Some modules, maintained in requirements.txt.
Install via
python3 -m pip install --user -r monitoring/aws/requirements.txt
- Authentication to the Hive team's AWS account, e.g. via a credentials file or environment variables.
- Your AWS user must be a member of the
lambda-admin
group, or have equivalent permissions.
- Your AWS user must be a member of the
To install a playbook, you can simply execute its yaml file, e.g.:
./monitoring/aws/upload-ci-monitor-lambda.yaml
- Log into the Hive team's AWS console
- Navigate to the Code tab for the lambda function you wish to test:
- Create or load a Test configuration using the drop-down next to the orange "Test" button.
The JSON payload will depend on the function you're running.
(If testing, you may wish to configure the
"recipients"
list so the emails only come to you.) - Punch the orange "Test" button.