Skip to content

Commit

Permalink
docs: engineering playbooks
Browse files Browse the repository at this point in the history
  • Loading branch information
abuaboud committed Feb 10, 2025
1 parent 5599bc4 commit 6c3f36f
Showing 1 changed file with 19 additions and 5 deletions.
24 changes: 19 additions & 5 deletions docs/handbook/engineering/onboarding/on-call.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ title: 'On-Call'
icon: 'phone'
---

The on-call rotation is a simple strategy to ensure there is always someone available to fix the issue for the users, each engineer is responsible for a week and the rotation is done by the team.

## Prerequisites:
- [Setup Incident IO](../playbooks/setup-incident-io)

Expand All @@ -20,6 +18,23 @@ We need to ensure there is **exactly one person** at the same time who is the ma
If you ever feel burn out in middle of your rotation, please reach out to the team and we will help you with the rotation or take over the responsibility.
</Tip>

## On-Call Schedule

The on-call rotation is managed through Incident.io, with each engineer taking a one-week shift. You can:
- View the current schedule and upcoming rotations on [Incident.io On-Call Schedule](https://app.incident.io/activepieces/on-call/schedules)
- Add the schedule to your Google Calendar using [this link](https://calendar.google.com/calendar/r?cid=webcal://app.incident.io/api/schedule_feeds/cc024d13704b618cbec9e2c4b2415666dfc8b1efdc190659ebc5886dfe2a1e4b)

<Warning>
Make sure to update the on-call schedule in Incident.io if you cannot be available during your assigned rotation. This ensures alerts are routed to the correct person and maintains our incident response coverage.

To modify the schedule:
1. Go to [Incident.io On-Call Schedule](https://app.incident.io/activepieces/on-call/schedules)
2. Find your rotation slot
3. Click "Override schedule" to mark your unavailability
4. Coordinate with the team to find coverage for your slot
</Warning>


## What it means to be on-call

The primary objective of being on-call is to triage issues and assist users. It is not about fixing the issues or coding missing features. Delegation is key whenever possible.
Expand All @@ -30,15 +45,15 @@ You are responsible for the following:

* Check [community.activepieces.com](https://community.activepieces.com) for any new issues or to learn about existing issues.

* Respond once the pager run.
* Monitor your Incident.io notifications and respond promptly when paged.

<Tip>
**Friendly Tip #1**: always escalate to the team if you are unsure what to do.
</Tip>

## How do you get paged?

Monitor and respond to incidents that come through these channels:
Monitor and respond to incidents that come through these channels:

#### Slack Fire Emoji (🔥)
When a customer reports an issue in Slack and someone reacts with 🔥, you'll be automatically paged and a dedicated incident channel will be created.
Expand All @@ -47,4 +62,3 @@ When a customer reports an issue in Slack and someone reacts with 🔥, you'll b
Watch for notifications from:
- Digital Ocean about CPU, Memory, or Disk outages
- Checkly about e2e test failures or website downtime

0 comments on commit 6c3f36f

Please sign in to comment.