Adobe Platform scripts to bootstrap a CoreOS cluster
& run Mesos/Marathon/Chronos/Zookeeper-Exhibitor.
Provides node-level services as Fleet Units
for every machine in the cluster.
Most services (logging, metrics, monitoring) run on all nodes, some only run on specific tiers based on the metadata that is injected into Fleet.
The aim of this setup is to move instance provisioning steps into the CoreOS machine level, automated via fleetctl/systemctl. Almost all of our systemd units utilize docker to run our services. Consequently, we're able to use the vanilla CoreOS EC2 AMI (i.e.: we don't bake AMIs at all). That being said, we have methods in this repo that also deal with sensitive data/secrets to configure various services (more below).
This repository may reference private repositories or scripts. Most should be replaceable with your own, but either way - proceed with caution as this project is highly experimental and certain nuances may not be well documented. If you want to use this repo, you may have to prune the code a bit and edit/delete certain files.
The purpose of this repository is to house all setup scripts and systemd/fleetd units in a central location, separate of our infrastructure provisioning scripts (cloudformation).
All setup behavior is defined in the init
script.
Assumptions:
- Your infrastructure has 3 tiers:
control
,proxy
,worker
- ALL nodes run a
bootstrap.service
, whatever that may be. - Some of the scripts require
/etc/environment
to contain certain variables (usually cloudformation parameters such as route53 entries) - S3 buckets are set correctly and all required credential files (SSH keys, datadog & sumologic credentials) are properly provided to
init
& can be downloaded using behance/docker-aws-s3-downloader
Our bootstrap.service
just clones this repo and runs the init
script.
From there, it does a couple of things:
- ensure that any credentials/secure files are downloaded from S3 (to allow docker & git to pull private dependencies)
- configure SSH configs to allow github.com access
- copy
.dockercfg
into/root
# TODO: refactor process as this is a hack - runs ALL scripts in
v2/setup
- these scripts will always be run with
sudo
(i.e.: as root) - set things up like create motds, aliases, dropins for various services
- these scripts will always be run with
- starts up tier-specific template units that are specified by the running machines' IP (provided by CoreOS / cloudinit)
- these are started via fleet, event though they are NOT global units and run on specific machines
- rationale for this is to give us granular control over certain units, such as mesos-slaves. It allows us to control individual nodes, or perform rolling actions (such as deploys) while retaining visibility into the cluster as a whole.
- submits and starts generic fleet units
- Docker Logrotate (based on michaloo/logrotate)
- Docker Image/Container Cleanup
- AWS EC2 Container Registry (ECR) login (Worker Nodes only)
- SSHD mask
- Mesos Master
- Marathon
- Exhibitor (for Zookeeper)
- Chronos
- Flight Director - private Marathon deployment wrapper/manager (stay tuned!)
- HUD - private UI shim for flight-director (stay tuned!)
CAPCOM
- private Container-Proxy Manager (stay tuned!)- Heatshield Proxy (our version of nginx) or HAProxy
- Mesos Slave
All secrets & key management is a bit adhoc. Most of the setup
scripts, which house the logic for setting up the data for then fleet units to use, require a few things to download secrets & keys:
- the
$CONTROL_TIER_S3SECURE_BUCKET
environment variable, written into/etc/environment
by cloudformation - behance/docker-aws-s3-downloader container to download files
- IAM roles to access
$CONTROL_TIER_S3SECURE_BUCKET
Secrets make it onto the nodes in the form of flat text files that live within $CONTROL_TIER_S3SECURE_BUCKET
. The setup
files individually know which file(s) they need to download & how to read, set or use the data for their corresponding units. So for example, the datadog unit requires an etcd
key, /ddapikey
. Knowing this, we have a datadog setup script which downloads a .datadog
file from $CONTROL_TIER_S3SECURE_BUCKET
, expects it to be in a certain format, and sets the etcd key.
We are planning to deprecate the following in favor other solutions (DynamoDB + KMS?).
Service | File | Format |
---|---|---|
Datadog | .datadog |
Just the key. Nothing else. |
Sysdig | .sysdig |
Just the key. Nothing else. |
Sumologic | .sumologic |
ID=YOURID SECRET=YOURSECRET |
Flight Director | .flight-director |
/FD/GITHUB_CLIENT_ID (YOUR GITHUB APP ID) /FD/GITHUB_CLIENT_SECRET (YOUR GITHUB APP SECRET) /FD/GITHUB_ALLOWED_TEAMS org/team |
HUD | .hud |
/HUD/client-id (GITHUB_APP_ID can == value in .flight-director) /HUD/client-secret (GITHUB_APP_SECRET can == value in .flight-director) |
Marathon | .marathon |
/marathon/username a-username /marathon/password a-password |
AWS ECR | .ecr |
/ECR/region (ECR AWS Region) /ECR/registry-account (ECR AWS Account) |
.dockercfg
to download private containersid_rsa
to clone any private repositories
Nothing special needs to be done for these two just as long as the cloudformation templates sets the following in /etc/environment
$SECURE_FILES=.dockercfg:id_rsa,0600,.ssh/id_rsa
The format of this environment variable just needs to conform to behance/docker-aws-s3-downloader