It can be stressful when you lack observability or "o11y" into your application. And definitely contributes to fear of deployments.
Yet, we hear that lots of developers have poor visibility into production. With poor visibility, developers are unable to see:
… unless they go through tickets, permissions, and dashboards that don't tell the right story.
But it doesn't have to be like that.
This article will share some o11y best practices dev teams should be using for their code to improve their visibility.
Conventional monitoring tools tend to present issues and performance from the lens of infrastructure resources - not code, severely limiting developers and their visibility into their application.
So, the best thing devs can do to improve observability (o11y) into their product is to use a code-centric monitoring tool.
Here are a few simple questions to gauge whether or not your monitoring tool is code-centric:
A code-centric tool is just one way to improve developer visibility into a product. Insight into the user experience is also critical.
We understand that the last thing you want from a monitoring tool is a ton of alerts. Not only can they be disruptive, but they also require a lot of configurations. Still, there are a couple of important alerts you need to be aware of for the sake of the user experience.
You don't need alerts about every little thing when it comes to the user experience. Instead, focus on what is business-critical and nothing else. With this in mind, pick one or two metrics that your users deeply care about and alert on that. Here are a couple of examples:
That's it—numbers such as these alert you to the fact that users cannot use your service. Take it a step further and automatically page on-call developers if your application falls below your business-critical metrics.
This proactive o11y best practice will help minimize bad user experiences. Once you have your metrics in place, it’s time to get proactive about errors.
It's essential to complement alerts critical to the user experience with a healthy curiosity of anomalies and errors that are not yet critical.
By paying attention to non-critical errors during your work hours, you'll reduce the risk of a severe incident. A great way to see non-critical errors is with an error-monitoring tool.
This is what o11y is about - understanding the system from its internal signals to have an informed mental model of how the system works and not some fantasy that looks great on a chart.
When an incident occurs, you need data, especially if it spans different teams, systems, and services.
Severe outages are black swans - they do not happen in ways that we expect them to.
Here are a couple of things you will likely need to fix a completely unknown situation:
Many of these use cases are rare for a single team but relatively frequent for an entire organization. Satisfy these use cases with a centralized platform with these cases in mind.
Airbrake Error Monitoring and Performance Monitoring embodies these o11y principles for developer-centric monitoring. In as little as three minutes, you'll have access to an error and performance monitoring tool that provides in-depth background information on errors within your code and how they impact your users. See for yourself with a free 14-day trial.