Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resque scheduler #93

Merged
merged 4 commits into from
Feb 15, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions cookbooks/resque_scheduler/attributes/default.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
default['resque_scheduler'] = {
'is_resque_scheduler_instance' => (node['dna']['instance_role'] == 'solo') || (node['dna']['instance_role'] == 'util' && node['dna']['name'] == 'resque'),
}
157 changes: 157 additions & 0 deletions cookbooks/resque_scheduler/files/default/resque-scheduler
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
#!/bin/sh
# Shamelessly stolen from the resque wrapper and modified for just scheduler.
#
# This script starts and stops the Resque daemon
# This script belongs in /engineyard/bin/resque
#
# Changes for consideration to resque script:
# * Problems to be solved:
# ** Resque logs overwritten on start and stop (DONE)
# ** Time for worker to stop
# Its pefrectly reasonable to expect a worker to need several minutes to stop.
# In the resque Readme, GH use resque to archive things and expect it to take
# 10 minutes for eg. so a variable $WAIT_TIME is set.
# ** Time for a worker to be allowed to run before being regaurded as stale
# ** We need to kill with -QUIT first, before simply trying to terminate with -15
# ** We've found that the process of itterating children and killing (in this case the children
# are the rake tasks launched by su) is often not working..i.e. they don't die. this needs
# fixing. Ppl are working around it by killing the su (i.e. the PID in the pid file), but that
# murders the children suddenly.
# ** ## INVALID ##
# We'd like to be able to use the COUNT= parameter. This saves some memory (30Mb+ per worker)
# but strips us of the ability to track the workers by memory, so an accompanying cron script
# is required
# ## Not going to do this. The resque source code mentions that this is only intended
# ##for use in development, so who are we to argue!
# ** We'd like to be able to request a sudden death if required
# ** Instancing. We may want > 1 worker instance (which currently equates to a monit stanza)
# to process a queue. The correct way is prob to have a diff conf for each stanza.
# Once customer (limos on ey05-s00522) worked around this with an instance parameter.
# One advantage of this over say having > 1 conf file specfify the same queue, is that
# logging is arguably more handy per queue that per conf, and we can't do that by using
# several conf files to specfiy 1 queue (as it stands)
# ** Add pause/continue functionality (USR2/CONT)
# ** Add kill child functionality (USR1)
# ** Bug where monit either removes a PID file, or fails to write one, resulting in at least
# one rogue worker.<< DONE Hopefuly!

usage() {
echo "Usage: $0 <appname> {start|stop} <environment>"
exit 1
}

if [ $# -lt 3 ]; then usage; fi

if [ "`whoami`" != "root" ]; then
logger -t `basename $0` -s "Must be run as root"
exit 1
fi

# Basic setup of default values
APP=$1; ACTION=$2; RACK_ENV=$3;

# Paths
PATH=/data/$APP/current/ey_bundler_binstubs:$PATH
CURDIR=`pwd`

APP_DIR="/data/${APP}"
APP_ROOT="${APP_DIR}/current"
APP_SHARED="${APP_DIR}/shared"
APP_CONFIG="${APP_SHARED}/config"

clean_exit() {
cd $CURDIR
exit $RESULT
}

WORKER_REF="resque_scheduler"
LOG_FILE="/data/$APP/current/log/$WORKER_REF.log"
LOCK_FILE="/tmp/$WORKER_REF.monit-lock"
PID_FILE="/var/run/engineyard/resque-scheduler/$APP/$WORKER_REF.pid"
GEMFILE="$APP_ROOT/Gemfile"
RAKE="rake"

if [ -f $GEMFILE ];then
RAKE="$APP_ROOT/ey_bundler_binstubs/rake"
fi

if [ -d $APP_ROOT ]; then
USER=$(stat -L -c"%U" $APP_ROOT)
export HOME="/home/$USER"

# Fix for SD-3786 - stop sending in VERBOSE= and VVERBOSE= by default
if declare -p VERBOSE >/dev/null 2>&1; then export V="VERBOSE=$VERBOSE"; fi
if declare -p VVERBOSE >/dev/null 2>&1; then export VV="VVERBOSE=$VVERBOSE"; fi

# Older versions of sudo need us to call env for the env vars to be set correctly
COMMAND="/usr/bin/env $V $VV APP_ROOT=${APP_ROOT} RACK_ENV=${RACK_ENV} RAILS_ENV=${RACK_ENV} MERB_ENV=${RACK_ENV} $RAKE -f ${APP_ROOT}/Rakefile resque:scheduler"

if [ ! -d /var/run/engineyard/resque-scheduler/$APP ]; then
mkdir -p /var/run/engineyard/resque-scheduler/$APP
fi

# handle the second param, don't start if already existing
if [ -f $LOCK_FILE ]; then
logger -t "monit-resquescheduler[$$]:" "Monit already messing with $WORKER_REF (`cat $LOCK_FILE`)"
clean_exit 1
else
echo $$ > $LOCK_FILE
fi

case "$ACTION" in
start)
cd /data/$APP/current
logger -t "monit-resquescheduler[$$]:" "Starting Resque worker $WORKER_REF"
if [ -f $PID_FILE ]; then
PID=`cat $PID_FILE`
if [ -d /proc/$PID ]; then
logger -t "monit-resquescheduler[$$]:" "Resque worker $WORKER_REF is already running with $PID."
RESULT=1
else
rm -f $PID_FILE
logger -t "monit-resquescheduler[$$]:" "Removing stale worker file ($PID_FILE) for pid $PID"
fi
fi
if [ ! -f $PID_FILE ]; then
exec su -c"$COMMAND" $USER >> $LOG_FILE 2>&1 &
RESULT=$?
logger -t "monit-resquescheduler[$$]:" "Started with pid $! and exit $RESULT"
#while [ ! -f $PID_FILE ]
#do
echo $! > $PID_FILE
sleep .1
#done
else
RESULT=1
fi
rm $LOCK_FILE
clean_exit $RESULT
;;
stop)
logger -t "monit-resquescheduler[$$]:" "Stopping Resque worker $WORKER_REF"
if [ -f $PID_FILE ]; then
kill -TERM `cat $PID_FILE` && sleep 30
SLEEP_COUNT=0
while [ -e /proc/$child ]; do
sleep 15
let "SLEEP_COUNT+=15"
if(( "$SLEEP_COUNT" > 30 )); then
kill -9 `cat $PID_FILE` 2>/dev/null; true
logger -t "monit-resquescheduler[$$]:" "Murdering Resque worker with $PID for $WORKER_REF"
break
fi
done
fi
[ -e "$PID_FILE" -a ! -d /proc/$PID ] && rm -f $PID_FILE
rm $LOCK_FILE
clean_exit 0
;;
*)
usage
rm $LOCK_FILE
;;
esac
else
echo "/data/$APP/current doesn't exist."
usage
fi
5 changes: 5 additions & 0 deletions cookbooks/resque_scheduler/metadata.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name 'resque_scheduler'
description 'Configuration & deployment of resque-scheduler on Engine Yard'
maintainer 'Engine Yard'
maintainer_email '[email protected]'
version '1.0'
37 changes: 37 additions & 0 deletions cookbooks/resque_scheduler/recipes/default.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#
# Cookbook Name:: resque-scheduler
# Recipe:: default
#
if node['resque_scheduler']['is_resque_scheduler_instance']
execute 'install resque gem' do
command 'gem install resque redis redis-namespace yajl-ruby -r'
not_if { 'gem list | grep resque' }
end

node['dna']['applications'].each do |app, _data|
template "/etc/monit.d/resque_scheduler_#{app}.monitrc" do
owner 'root'
group 'root'
mode 0644
source 'resque-scheduler.monitrc.erb'
variables(
app_name: app,
rails_env: node[:dna][:environment][:framework_env]
)
end

cookbook_file "/data/#{app}/shared/bin/resque-scheduler" do
source 'resque-scheduler'
owner 'root'
group 'root'
mode 0755
backup 0
end
end

execute 'ensure-resque-is-setup-with-monit' do
command %(
monit reload
)
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
check process resque-scheduler<%= @app_name %>
with pidfile /var/run/engineyard/resque-scheduler/<%= @app_name %>/resque_scheduler.pid
start program = "/data/<%= @app_name %>/shared/bin/resque-scheduler <%= @app_name %> start <%= @rails_env %>" with timeout 120 seconds
stop program = "/data/<%= @app_name %>/shared/bin/resque-scheduler <%= @app_name %> stop <%= @rails_env %>" with timeout 120 seconds # on purpose
if totalmem is greater than 300 MB for 2 cycles then restart # eating up memory?
group <%= @app_name %>_resque-scheduler
6 changes: 6 additions & 0 deletions examples/resque_scheduler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Resque Scheduler

This example contains a complete cookbooks/ that you can use on the stable-v5 stack.

See https://github.com/engineyard/ey-cookbooks-stable-v5/tree/next-release/examples/resque_scheduler/cookbooks/custom-resque_schedulerfor complete instructions.

3 changes: 3 additions & 0 deletions examples/resque_scheduler/cookbooks/custom-redis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Custom Redis

This custom-redis recipe is used in the resque example to install redis on the utility instance named "resque" instead of the default "redis".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we include this small Redis wrapper at all? Leaning towards adding a section on the Resque-Scheduler wrapper's Readme telling the customer to enable the full Redis recipe (as we do with Sidekiq).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened PR #150 to remove custom-redis from resque_scheduler

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
default['redis']['utility_name'] = 'resque'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a clustered env, this line should be replaced with at least:

default['redis'].tap do |redis|

  # Collect the redis instances in this array
  redis_instances = []

  # Run Redis on a named util instance
  # This is the default
  redis['utility_name'] = 'resque'
  redis_instances << redis['utility_name']
  redis['is_redis_instance'] = (
    node['dna']['instance_role'] == 'util' &&
    redis_instances.include?(node['dna']['name'])
  )
end

and for a solo env with:

default['redis'].tap do |redis|

  # Collect the redis instances in this array
  redis_instances = []

 # Run redis on a solo instance
  # Not recommended for production environments
  #redis['is_redis_instance'] = (node['dna']['instance_role'] == 'solo') 
end

3 changes: 3 additions & 0 deletions examples/resque_scheduler/cookbooks/custom-redis/metadata.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
name 'custom-redis'

depends 'redis'
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include_recipe 'redis'
104 changes: 104 additions & 0 deletions examples/resque_scheduler/cookbooks/custom-resque_scheduler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Custom Resque Scheduler

The Resque scheduler recipe creates a resque-scheduler script and a monit config file. Each application on the environment will get its own Resque scheduler.

This recipe depends on the resque recipe.

## Installation

For simplicity, we recommend that you create the cookbooks directory at the root of your application. If you prefer to keep the infrastructure code separate from application code, you can create a new repository.

Our main recipes have the `resque_scheduler` recipe but it is not included by default. To use the `resque_scheduler` recipe, you should copy this recipe `custom-resque_scheduler`. You should not copy the actual `resque_scheduler` recipe. That is managed by Engine Yard.

1. Edit `cookbooks/ey-custom/recipes/after-main.rb` and add

```
include_recipe 'custom-resque_scheduler'
```
2. Edit `cookbooks/ey-custom/metadata.rb` and add
```
depends 'custom-resque_scheduler'
```
3. Copy `examples/resque/cookbooks/custom-resque_scheduler` to `cookbooks/`
```
cd ~ # Change this to your preferred directory. Anywhere but inside the application
git clone https://github.com/engineyard/ey-cookbooks-stable-v5
cd ey-cookbooks-stable-v5
cp examples/resque/cookbooks/custom-resque_scheduler /path/to/app/cookbooks/
```
4. Download the ey-core gem on your local machine and upload the recipes
```
gem install ey-core
ey-core recipes upload --environment <nameofenvironment> --path <pathtocookbooksfolder> --apply
```
If you do not have `cookbooks/ey-custom` on your app repository, you can copy `examples/resque_scheduler/cookbooks/ey-custom` to `/path/to/app/cookbooks`.
## Customizations
All customizations go to `cookbooks/custom-resque_scheduler/attributes/default.rb`.
### Choose the instances that run the recipe
By default, the resque recipe runs on a utility instance named `resque` or on a solo instance. You can change this using `node['dna']['instance_role']` and `node['dna']['instance_name'] `.
```ruby
# this is the default
default['resque']['is_resque_instance'] = (node['dna']['instance_role'] == 'solo') || (node['dna']['instance_role'] == 'util' && node['dna']['name'] == 'resque')
# run the recipe on a utility instance named background_workers
default['resque']['is_resque_instance'] = (node['dna']['instance_role'] == 'util' && node['dna']['name'] == 'background_workers')
# run the recipe on a solo instance only
default['resque']['is_resque_instance'] = (node['dna']['instance_role'] == 'solo')
```

### Choose the applications that have Resque scheduler

By default, all applications in an environment will have Resque scheduler. You can change this by specifying an array of application names.

If you only have one application, you don't need to make any changes to `default['resque_scheduler']['applications']`.

```ruby
# this is the default
# get all applications
default['resque_scheduler']['applications'] = 'applications' => node['dna']['applications'].map{|app_name, data| app_name}

# specify the application name
default['resque_scheduler']['applications'] = %w[todo]
```

## Restarting Resque scheduler

This recipe does NOT restart Resque scheduler. The reason for this is that shipping your application and rebuilding your instances (i.e. running chef) are not always done at the same time. It is best to restart your Resque scheduler when you ship (deploy) your application code.

If you're running Resque on a solo instance or on your app master, add a deploy hook similar to:

```
on_app_master do
sudo "monit -g #{config.app}_resque-scheduler restart all"
end
```

On the other hand, if you'r running Resque scheduler on a dedicated utility instance, the deploy hook should be like:

```
on_utilities :resque do
sudo "monit -g #{config.app}_resque-scheduler restart all"
end
```

where resque is the name of the utility instance.

You likely want to use the after_restart hook for this. Put the code above in `deploy/after_restart.rb`.

See our [Deploy Hook](https://engineyard.zendesk.com/entries/21016568-use-deploy-hooks) documentation for more information on using deploy hooks.

You can also stop the Resque scheduler at the start of the deploy if necessary. Check https://support.cloud.engineyard.com/hc/en-us/articles/205407428-Configure-and-Deploy-Resque for more information.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# default['resque_scheduler']['is_resque_scheduler_instance'] = (node['dna']['instance_role'] == 'util' && node['dna']['name'] == 'resque')
# default['resque_scheduler']['applications'] = %w[todo]
# default['resque_scheduler']['worker_count'] = 4
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
name 'custom-resque_scheduler'

depends 'resque_scheduler'
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include_recipe 'resque'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be include_recipe 'resque_scheduler'

4 changes: 4 additions & 0 deletions examples/resque_scheduler/cookbooks/ey-custom/metadata.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name 'ey-custom'

depends 'custom-redis'
depends 'custom-resque_scheduler'
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
include_recipe 'custom-redis'
include_recipe 'custom-resque_scheduler'