The task scheduler on Drove
Epoch is task runner that allows you to run stand-alone Tasks on DROVE
Think of it being very similar to Chronos, but on Drove and not Mesos.
- Self-serve UI for creating and managing tasks
- Ability to create a topology and schedule a recurring task
- Ability to run the above topology instantly
TASK: The primary unit of execution in Epoch is a Task. A task is a stand-alone unit of execution that run on
Drove.
TOPOLOGY: A topology is a definition of how the task is run - it can be scheduled to run at a particular time or at a
recurring intervals.
- Instant Run: A topology can be run instantly at a particular time
- Scheduled Run: Every topology is created with a QUARTZ cron expression. This expression determines the schedule of
the topology.
⚠️ Tasks On Drove: Today, Drove only supports running tasks that are packaged as docker images. So, the task that you want to run on Drove should be packaged as a docker image and pushed to a Docker registry.
ℹ️ Quartz Cron: Epoch uses Quartz cron expressions to schedule tasks. You can read more about Quartz cron expressions here
The first important aspect is to understand Leader Election and how requests are routed across different epoch
instances.
A single Epoch node is the leader to all requests from the UI, and is responsible for running all Topologies/Tasks, and doing the
various state management.
The remaining nodes route requests to the leader node.
This is done using a simple Zookeeper based leader election, and a Routing Servlet Filter.
The actual execution of the task is done on Drove. Epoch is only responsible for managing the state of the task, and the
topology.
So the resource intensive operation (of actually running the docker task) is outside of Epoch. Since only tracking these
tasks through a scheduler is the only thing that Epoch does, one node of epoch should scale for 10s of thousands of task
runs.
Should we need to scale this further, we can always distribute the tasks across multiple nodes.
But until then, with the choice of not distributing, tracking all of it through a single instance is more scalable.
Internally, Epoch uses Kaal for scheduling the status checks and the topology runs in memory.
A TLDR version of the request routing is represented below
As explained earlier, a task is a stand-alone unit of execution.
The following diagram shows the various states of a task. This is inline with the states of a task in Drove.
The state of the task determines the state of a topology run.
The following diagram shows the various states of a specific run of the Topology
And finally, the above is only applicable if the Topology is not PAUSED. This is purely determined by the state of the
Topology, set using the UI
The following shows the various states of a topology
Epoch uses Zookeeper to store the tasks and topologies. The following diagram shows the structure of the data in Zookeeper
The Epoch server container is available at ghcr.io.
The container is intended to be run on a Drove cluster.
The following environment variables are understood by the container:
Variable Name | Required without external config | Description |
---|---|---|
ZK_CONNECTION_STRING | Yes. Unnecessary if config is being injected. | Connection String for the Zookeeper Cluster |
DROVE_ENDPOINT | Yes. Unnecessary if config is being injected. | HTTP(S) endpoint for the Drove cluster |
DROVE_APP_NAME | Injected by Drove | App name for the container on the Drove cluster. Do not keep changing this as it will lose stored job context. |
CONFIG_FILE_PATH | To use custom config file. Can be put on some executor and volume mounted in the container. | By default config file in /home/default/config.yml is used. |
ADMIN_PASSWORD | Optional but Recommended. Unnecessary if config is injected. | Password for the user admin which has read/write permissions. Default value is admin . |
GUEST_PASSWORD | Optional but Recommended. Unnecessary if config is injected. | Password for the user guest which has read only permissions. Default value is guest . |
GC_ALGO | Optional | GC to be used by JVM. By default G1GC is used. |
JAVA_PROCESS_MIN_HEAP | Optional | Minimum Java Heap size. Set to 1 GB by default. |
JAVA_PROCESS_MAX_HEAP | Optional | Maximum Java Heap size. Set to 1 GB by default. |
JAVA_OPTS | Optional | Additional java options. |
DEBUG | Optional | Set to a non-zero value to print environment variables etc. Note: This will print all env variables. |
The following is a sample app specification for deploying the epoch container on drove.
{
"name": "epoch",
"version": "1",
"executable": {
"type": "DOCKER",
"url": "ghcr.io/phonepe/epoch:latest",
"dockerPullTimeout": "2 minute"
},
"exposedPorts": [
{
"name": "main",
"port": 8080,
"type": "HTTP"
},
{
"name": "admin",
"port": 8081,
"type": "HTTP"
}
],
"volumes": [],
"type": "SERVICE",
"logging": {
"type": "LOCAL",
"maxSize": "10m",
"maxFiles": 3,
"compress": true
},
"resources": [
{
"type": "MEMORY",
"sizeInMB": 4096
},
{
"type": "CPU",
"count": 1
}
],
"placementPolicy": {
"type": "ANY"
},
"healthcheck": {
"mode": {
"type": "HTTP",
"protocol": "HTTP",
"portName": "admin",
"path": "/healthcheck",
"verb": "GET",
"successCodes": [
200
],
"payload": "",
"connectionTimeout": "20 seconds"
},
"timeout": "5 seconds",
"interval": "20 seconds",
"attempts": 3,
"initialDelay": "0 seconds"
},
"readiness": {
"mode": {
"type": "HTTP",
"protocol": "HTTP",
"portName": "admin",
"path": "/healthcheck",
"verb": "GET",
"successCodes": [
200
],
"payload": "",
"connectionTimeout": "3 seconds"
},
"timeout": "3 seconds",
"interval": "10 seconds",
"attempts": 3,
"initialDelay": "10 seconds"
},
"tags": {},
"env": {
"JAVA_PROCESS_MIN_HEAP": "2g",
"JAVA_PROCESS_MAX_HEAP": "2g",
"ADMIN_PASSWORD" : "adminpassword",
"GUEST_PASSWORD" : "guestpassword",
"ZK_CONNECTION_STRING" : "<YOUR_ZK_CONNECTION_STRING>",
"DROVE_ENDPOINT" : "<YOUR_DROVE_ENDPOINT>"
},
"exposureSpec": {
"vhost": "epoch.<YOUR_DOMAIN>",
"portName": "main",
"mode": "ALL"
},
"preShutdown": {
"hooks": [],
"waitBeforeKill": "30 seconds"
},
"instances" : 1
}
Replace the following in the above:
- YOUR_ZK_CONNECTION_STRING - Connection string for your zookeeper cluster
- YOUR_DROVE_ENDPOINT - HTTP(S) endpoint for Drove
- YOUR_DOMAIN - Domain on which you want drove-nixy to configure nginx vhost
Save the json in some file say epoch.json
Deploy to drove using the following command:
drove -c testing apps create epoch.json
This will create an app called epoch-1
.
Scale the app to one instance.
drove -c oss apps deploy epoch-1 1 -w
The drove app should be available at http://epoch.<YOU_DOMAIN>.
Apache 2