When it comes debugging, which is better: error monitoring or logging?
As a developer, you know that whenever any activity happens on your servers, a record is created and stored. This is referred to as the log file. These log files hold all data produced by applications, so their value cannot be underestimated.
However, simply allowing records to be written to log files by themselves will not help us much due to the vast amounts of data collected daily. The sheer volume of these records, if left alone, are virtually impossible to navigate. In a way, log files become the proverbial “haystack” of data, where our needles of “information” live.
Before error monitoring, you were required to engage in a series of logging practices with the aim of managing this data into more usable chunks. This is time-consuming. Error monitoring vastly simplifies your ability to push out code without needing to delve into thousands of lines of data to find just one line of code where your error is located.
Why Logging Is Not Enough
Knowing what you have to look for
In the past, log files were our primary source of data. At one time or another, we’ve had multiple windows open at a time, grepping for errors on each screen. This approach may work for relatively small collections of data, but it’s not as effective when looking through huge chunks of data.
For example, say we are digging through our logs looking for fatal errors using grep. While effective if we know exactly what we are looking for, this approach is not particularly efficient for spotting problems. We are forced to dig through and manually look at individual errors using a variety of different parameters and attributes.
For large organizations, moving from grep to Splunk was an epiphany: it provided some automation to the grepping process. However, even with Splunk, we still needed to know what we were looking for to find it.
Looking for patterns
Most of the time, errors have patterns that are often associated with a specific file, deployment, and/or a single user.
However, there’s more in the logs than errors. You might find patterns of system slowdowns or bottlenecks that may not be immediately evident by looking at specific messages. Furthermore, we may not notice the gaps between the beginning and end of processes.
Even with basic errors, activity by one user can throw off an entire log. We may see a long row of errors, all of which are associated with activity by a specific person, but this may not be obvious in the beginning. It can take forever to notice these patterns within the log file.
Similarly, errors may appear as a result of a specific browser, server, OS, etc.
Essentially, all of this can be summed up in one sentence:
Logging does not define patterns.
This is especially a problem for large log files. Whether it’s a single file or multiple files, there’s simply too much data to sort through.
The Cost of logging
As we have moved to cloud computing, we are now often paying for storage space. Logging can take up a significant portion of available storage due to multiple files being stored on the server's hard disks. If there is a significant amount of activity occurring on these systems, these files can grow to be extremely large in a short amount of time. The more data we choose to log (especially if we are in debug mode), the faster these files fill up. If there are multiple applications running simultaneously, it's easy to run out of system resources.
Similarly, the very act of searching files for errors or patterns is often memory intensive. As a result, most people do not keep logs for a long time. Because of this, contextual data gets lost.
If we are dealing with a system with an intensive user interface, the amount of time logging takes can impact users negatively, leading to a lower Apdex Score.
Tracking errors across multiple systems and log files can be difficult, particularly if you’re running multiple or duplicate processes. Even with Splunk, you still need to search files for issues. What’s really needed is an error monitoring tool.
Why You Need Error Monitoring
Monitoring is the practice of tracking log files with a separate tool. This allows for applications to remain available while responding to user or system requests within a reasonable timeframe.
Good error monitoring tracks large amounts of data and presents it to users in the form of easily consumable information. Airbrake is a good example. Airbrake Error Monitoring provides developers with a great deal of information about what's happening in their app via dashboards. With logging, everything is abstract and difficult to find. Monitoring aggregates key information and trends to help you easily pinpoint issues.
Monitoring tools can solve several problems associated with logging. In an application like Airbrake, you don’t have to dump old log files. Instead, this data is aggregated so you can see clear historical patterns, without having to write and run elaborate queries off of data that no longer exist.
In addition to this, error and performance monitoring can track deployments, code diffs, and stack traces.
We can also use monitoring tools to identify certain business trends. For example, if there is a large amount of activity occurring at certain times or locations, this could signify that a particular marketing campaign, seasonality, or global event may be impacting a business.
Don’t underestimate the power of error monitoring, especially if you are a small company that can’t afford errors. Monitoring will save you time, money, and resources.
Using Both Error Monitoring and Logging
Of course, monitoring cannot completely eliminate the need for logging. In the end, both serve two separate purposes and functions. Ideally, you want to both log and monitor at the same time.
The logging process is about managing data inside the log files (e.g. aggregation, storing, identification of duplication, the security of data, etc.). Without effective logging, we cannot monitor. The monitoring process exists at a higher level, so you need data to be well-managed and organized before it can be monitored. Once this is done, you’ll quickly see the big picture of what is going on within your application.
When you understand what’s going on within your application, you can then use logging practices to access the raw log files during the debugging process. As you can see, both truly do work together.
Best practices for Monitoring and Logging
To make the best use of your logging and monitoring tools, it is wise to follow a few best practices.
- Configuration: Set up your system to send log data directly to your monitoring tools. This way we can get a real-time picture of what is occurring at any given point in time. Once configured, your monitoring tools will alert you to various issues before you spot them in the log.
- Triage: Log only data that is necessary for troubleshooting and/or compliance. Set up a series of rules for your log files to store important pieces of data and to discard the rest.
- Structure: Structured data such as user id, error type, operating system, and more will make it a lot easier for your monitoring tools to be able to track common problems.
The best solution to catch bugs is to use both logging and error monitoring tools. This way, you’ll see the big picture of what is going on without needing to scrape through every single line of code to find an error. Furthermore, without logging, there is no monitoring and troubleshooting individual issues can become complicated.
Where can you find a powerful and affordable error monitoring tool? Try Airbrake! Our error monitoring and performance monitoring applications provide insight into your entire app stack. It’s simple to install and works with several languages and frameworks, such as Ruby, Java, Golang, Python, etc. See for yourself and sign up for a free 14-day trial, which includes unlimited errors, unlimited user accounts, and unlimited projects.