Breaking things and fixing them again is one of the best ways to learn. I learned this lesson early, thanks to my younger sister and her Japanese robotic toy dog. Somehow, I convinced her to let me take apart her robodog so I could see how it works.
“I’ll put it back together. Don’t be such a baby!”
How wrong was I? It would probably have been easier to put back together a Volkswagen Beetle than this toy dog. There I was, sitting clueless on the floor, surrounded with plastic parts and electronics. My sister was crying and I was sweating, trying to fix everything before our parents returned home. In the end, just in time, the dog was put back together (albeit with some mysterious spare parts hidden in the bin).
Fixing things and building things are very different to one other
Still, I learned a lot that day. I learned that engineering is hard. I learned that breaking things feels bad. I learned that trying to fix things can be stressful. I learned that fixing things and building things are very different to one other. But above all, I learned that trying to fix things is actually a great way to learn.
Introducing triage engineer rotations
I often think of that incident because I’ve found many of those lessons resonate with the way we do things at Intercom, particularly in the way we separate the different processes of building and fixing.
Recently, Brian Scanlan wrote about how we developed an out-of-hours on call team to deal with emergencies and ensure the best possible uptime for our product while avoiding burnout among engineers.
But we also have a way of optimizing our on call process during the working week to allow engineers to focus on building rather than being distracted by fixing issues.
We introduced the idea of having a triage engineer rotation. Every week, we nominate a triage engineer for the team, who serves to shield teammates from distractions during the working hours. Their teammates, in turn, are able to deeply focus on their goals. But the benefits go much further than fostering better focus.
Triage engineer mission and expectations
Your main mission as the triage engineer is to shield teammates from distractions. That means being the first one to answer any messages regarding the team and the systems you own. You report issues status to the team in the morning stand-up and inform them of anything relevant.
Also, the triage engineer should manage high-priority issues, investigate new issues, and, if time permits, fix low-priority issues. If some issues require more planning to be resolved, triage engineer will suggest them as next week’s tasks during a planning meeting.
Triaging is nothing more than determining the priority of an emergency. The prerequisite for this process is that each team should have a set of categories covering their area of responsibility. These can be created as labels or tags within your issues tracking software. We use GitHub for issues tracking.
There are several steps we take while triaging an issue, and the workflow looks like this.