From 6c3f36f106bcc7856991b4824907bf3e5c1affc8 Mon Sep 17 00:00:00 2001 From: Mohammad AbuAboud Date: Mon, 10 Feb 2025 19:32:45 +0000 Subject: [PATCH] docs: engineering playbooks --- .../engineering/onboarding/on-call.mdx | 24 +++++++++++++++---- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/docs/handbook/engineering/onboarding/on-call.mdx b/docs/handbook/engineering/onboarding/on-call.mdx index 23396f0cd5..9f8eb327fa 100644 --- a/docs/handbook/engineering/onboarding/on-call.mdx +++ b/docs/handbook/engineering/onboarding/on-call.mdx @@ -3,8 +3,6 @@ title: 'On-Call' icon: 'phone' --- -The on-call rotation is a simple strategy to ensure there is always someone available to fix the issue for the users, each engineer is responsible for a week and the rotation is done by the team. - ## Prerequisites: - [Setup Incident IO](../playbooks/setup-incident-io) @@ -20,6 +18,23 @@ We need to ensure there is **exactly one person** at the same time who is the ma If you ever feel burn out in middle of your rotation, please reach out to the team and we will help you with the rotation or take over the responsibility. +## On-Call Schedule + +The on-call rotation is managed through Incident.io, with each engineer taking a one-week shift. You can: +- View the current schedule and upcoming rotations on [Incident.io On-Call Schedule](https://app.incident.io/activepieces/on-call/schedules) +- Add the schedule to your Google Calendar using [this link](https://calendar.google.com/calendar/r?cid=webcal://app.incident.io/api/schedule_feeds/cc024d13704b618cbec9e2c4b2415666dfc8b1efdc190659ebc5886dfe2a1e4b) + + +Make sure to update the on-call schedule in Incident.io if you cannot be available during your assigned rotation. This ensures alerts are routed to the correct person and maintains our incident response coverage. + +To modify the schedule: +1. Go to [Incident.io On-Call Schedule](https://app.incident.io/activepieces/on-call/schedules) +2. Find your rotation slot +3. Click "Override schedule" to mark your unavailability +4. Coordinate with the team to find coverage for your slot + + + ## What it means to be on-call The primary objective of being on-call is to triage issues and assist users. It is not about fixing the issues or coding missing features. Delegation is key whenever possible. @@ -30,7 +45,7 @@ You are responsible for the following: * Check [community.activepieces.com](https://community.activepieces.com) for any new issues or to learn about existing issues. -* Respond once the pager run. +* Monitor your Incident.io notifications and respond promptly when paged. **Friendly Tip #1**: always escalate to the team if you are unsure what to do. @@ -38,7 +53,7 @@ You are responsible for the following: ## How do you get paged? - Monitor and respond to incidents that come through these channels: +Monitor and respond to incidents that come through these channels: #### Slack Fire Emoji (🔥) When a customer reports an issue in Slack and someone reacts with 🔥, you'll be automatically paged and a dedicated incident channel will be created. @@ -47,4 +62,3 @@ When a customer reports an issue in Slack and someone reacts with 🔥, you'll b Watch for notifications from: - Digital Ocean about CPU, Memory, or Disk outages - Checkly about e2e test failures or website downtime -