top of page
  • Writer's pictureSudeep Chaudhari

On-call handover doesn't have to be so manual

Updated: Sep 13, 2023



Designed by storyset


In today's fast-paced digital landscape, the need for continuous availability and rapid incident response has led to the widespread adoption of on-call rotations among engineering teams. On-call duty allows organizations to address critical issues outside regular working hours, ensuring uninterrupted service for customers.


However, the process of handing over on-call responsibilities from one engineer to another can present its own set of challenges. Our survey and research show that on an average engineers are spending 1.5 -2hrs per week on writing the handover documentations. That’s literally $9,000 - $13,000 per team/year (considering $250k engineering compensation working for 250 weekdays). If your company has 10 teams it's 10x this cost. The problem is more prevalent in the companies with complex systems which need heavy on-call duties though consistently seen across all the technology based firms with different severity.


The Problem


In this article, we will explore common on-call handover problems encountered by engineers and especially where manual work is needed. So let's dig into these.


1. Disintegrated tooling


Most of the teams use various tools for various purposes in their on-call. E.g. There will be observability platforms like Datadog or New Relic, Alerting tool like PagerDuty and then collaboration tools like Slack. Though there are seamless integrations with Slack, engineers still have to refer to multiple systems to consolidate the details as a lot of the teams use docs or wiki to write their on-call handover documents. Managers can build templates to bring in the structure to the handover process though the data gathering still remains a manual process.


2. Manual Stats gathering


Incident stats are such a key aspect of being on top of your on-call process whether on a daily basis or towards the end of the rotation. To effectively gather the stats on how many incidents triggered, severity, details etc. most of the engineers have to refer to Slack channels or login to alerting tools for a report before updating them to handover wiki or doc. E.g. Pagerduty does provide capabilities to download incident reports from their portal but it's part of their Digital Operations package which comes with the hefty price tag of minimum $50/user per month. Most of the small and medium sized companies cannot afford that cost just for additional stats report. So eventually the burden is put on on-call engineers who might be already exhausted during rigorous on-call weeks.


3. Lack of frequent insights to managers and leads


Our survey also highlighted that managers or leads often miss the visibility into on-call incidents on a daily or frequent basis. Obviously they get pulled into high severity issues or are part of the handover where they get insights. However, they also need to make a deliberate attempt to go through all the Slack threads to get a sense of number of on-call issues, statuses and their impact. Looking into observability dashboards on a daily basis can give them an idea but often there are far more details covered there than needed at manager or leads level. Wouldn’t it be great to have an easy way to get a sneak peak into the stats on your fingertips? :)



The Solution


We strongly think there should be an easy solution for both engineers and managers or leads to stay on top of the on-call stats or metrics. We also believe the best solution would be the one which is highly integrated in your daily collaboration tools such as Slack. So, to solve most of these problems, we have been working very hard on https://www.next9.ai. This app for Slack can provide engineers off the shelf on-call incident reports per their schedule which they can readily use in their on-call handover eliminating manual work. In addition to that, managers can now get on-call summary stats per their needs whether on daily or weekly or per their needs in the Slack channel keeping the, on top of the on-call process.


Hope this solution eliminates some of the manual work related to on-call and you get time to innovate! Feel free to reach out at info@next9.ai if you have any questions or feedback to share with us.







100 views0 comments

댓글


Know More

Never miss an update

Thanks for submitting!

bottom of page