Jan 9, 2018 8:45:56 PM | DevOps DevOps Best Practices

An examination of some of the lesser-known DevOps best practices, including chaos engineering, infrastructure automation, and temporary environments.

DevOps is the practice of combining the philosophies and tools of software development (Dev) and software operation (Ops). For better or worse, DevOps encompasses every aspect of a software system, including actual source code, infrastructure, configuration, data, testing, deployment, staging, production, and so forth. Thus, finding and implementing the most widely accepted DevOps best practices can be a challenge, as there's no absolutely "correct" technique. Instead, improving your teams' DevOps practices is just as much about executing a philosophical shift as it is altering code or scripts in the system.

In today's article we'll explore a wide range of lesser-known DevOps best practices, including chaos engineering, continuous environment testing, infrastructure automation, temporary environments, and more. Let's get to it!

Version Control for All

Most development teams are likely well-accustomed to using version control for application source code. However, a critical DevOps best practice, which many teams may not be using, is to transition from versioning just a handful of elements like source control, to versioning everything that is even remotely involved in the application. Everything from source code and configuration to infrastructure and the database should be version controlled. This practice ensures that every aspect of the application, throughout the entire development life cycle, can be historically traced back through the single, ultimate origin that is the repository. It maintains a constant and explicit connection between all components of the application, so there's never any doubt which version of the database was used with which version of the source code.

While most teams understand the concept of versioning everything, few organizations actually implement the practice to the fullest extent. A partial implementation of versioning most, yet not all, aspects of the application largely defeats the purpose, and undermines the foundation and stability of the application architecture. Most teams version control their source code. A good percentage of teams also version their configuration. A small sampling of teams will even version their database. Unfortunately, that's typically where versioning stops in most software developments. Ideally, everything should be versioned -- not just the most common components. This includes all package dependencies, database migration scrips, and the database data itself.

The primary goal of versioning everything can be summed up thusly: Any (adequately privileged) team member should be able to recreate a specific version of the complete application stack by issuing a single command. If the entire system has been properly designed and versioned throughout development, this process should merely be a matter of executing one script, which itself triggers all sub-scripts necessary to create every component of the software.

Steps for implementing full versioning is beyond the scope of this article, but we'll explore a few explicit concepts in the sections to follow. The general idea is that every component of the software should be defined and generated by a source script. Whether the script is generating or modifying infrastructure, data, executing deployments, or configuring the application, if it is properly written and version controlled, you and your team can easily access and execute scripts from any iteration of the software you need.

Throughout the industry, most of these techniques are better known as X as code, such as infrastructure as code, policy as code, and so forth. The ultimate goal is to introduce idempotence into the application and infrastructure that runs it, which essentially means that a scripted release of one version of the software will be identical to another release of that same version, every time it's executed.

Automate Your Infrastructure

Implementing infrastructure as code practices revolves around the basic concept of infrastructure automation -- creating scripts that can perform every step of infrastructure creation including server propagation, OS installation, configuration, and so forth. With a proper script, infrastructure configuration is no longer tied to a single machine or cluster, but can be copied and repeated ad nauseam, for as many nodes as needed. Moreover, by explicitly defining every step in the process inside an automated script, multiple team members can alter and improve the process throughout the development life cycle, so the latest version is always the most robust and well-tested.

There are many different tools for handling infrastructure automation, but the most popular and well-established are Puppet and Chef. Both have their own pros and cons, but either choice will provide a solid foundation on which you and your team can generate infrastructure as code scripts with relative ease. Puppet is certainly the most dominant tool, with a great deal of high-profile organizations using their software for infrastructure implementation and management. On the other hand, Chef has a robust online learning tool, with tracks that will guide you through the process of exploring everything Chef is capable of.

Both Puppet and Chef use Ruby domain-specific languages to create all configuration scripts and execute all commands. Most importantly, both tools are idempotent, so you'll always get the exact same result when a particular configuration script is used, each and every time you execute it.

Embrace the Principles of Chaos Engineering

Chaos Engineering is the idea that modern distributed software systems are prone to experiencing random, turbulent conditions and, therefore, such systems should be designed to withstand unexpected problems and weaknesses in production environments. Chaos Engineering principles focus on four basic experiments used to test the weaknesses of the software system:

  • Start by defining steady state as some measurable output of a system that indicates normal behavior.
  • Hypothesize that this steady state will continue in both the control group and the experimental group.
  • Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
  • Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.

There are many ways that different organizations try to implement Chaos Engineering practices. Netflix developers, for example, created (and open-sourced) their internal Chaos Monkey tool, which randomly terminates virtual machine instances and containers within the production environment. While this may be a rather extreme approach to take right out of the gate, many teams find success by first experimenting with the randomized failure of non-critical services and instances. Once the team has developed techniques and automated responses that can handle these lesser services, then critical services are also introduced into the mix of potential failures.

The goal is simple: Force your team to develop software systems that can instantly and automatically adjust to failures within any and all components, without losing the core functionality of the software.

Deploy to Temporary Environments

A major benefit to automating your infrastructure is that your software is no longer tied to a single server instance or node. There's no more need for developers and IT admins to spend hours and hours slowly establishing and configuring the infrastructure components necessary to run a particular version of the application. When manual intervention is required to establish a development, staging, testing, production, or any other type of environment, there will inevitably be problems and slight differences from one environment to the next, which can lead to a wide range of unexpected behaviors and bugs.

The solution is simple: Implement a policy of using temporary environments for everything, save perhaps the production environment itself. The goal is that any given environment should only exist for a short period of time; typically long enough to execute the full suite of automated tests. With automated infrastructure practices and scripts already in place, a temporary environment can be created, tested, and destroyed -- all without manual, human intervention.

It may feel counter-intuitive at first, but there are a number of major advantages to using temporary environments. The first and most critical benefit is the removal of environment-specific dependencies. Your software will no longer be explicitly tied to a singular environment, but can instead be deployed and executed at will, on any environment that may be necessary at the time.

The other major benefit of a temporary environment policy is that automation becomes a necessity. Since environments are frequently created and automatically destroyed after a matter of days, if not hours, the entire team is forced to adapt and generate automated scripts that can handle all environmental propagation, which strengthens the entire infrastructure with every single addition or change to the scripts.

Implement Continuous Environment Testing

Continuous practices are becoming standard DevOps practices in many modern organizations. Continuous deployment, continuous integration, and continuous testing are common norms, but few organizations apply these same principles of continuous integration and testing to the environment on which the software is running. As infrastructure as code practices and principles are implemented, it becomes necessary to automate the success and stability of these propagated environments by implementing continuous environment testing into the mix.

Luckily, the concepts and practices behind automated environmental testing are simple. Start by defining what steps should be performed in a test, then write the automated test that describes the previous steps. From there, a script must be created that actually executes the actions defined in the test. Lastly, every test must be included in a larger script that can automatically execute everything without human intervention. This ensures that the process fits into the continuous definition that the other continuous practices rely upon. As discussed above, everything that defines these environment tests should also be versioned.

These tests need not be complicated, as they're often checking simple parameters of the environment, like what version of a package or software is installed. It's the combination of all these environmental tests -- and the ability to add and modify them as development progresses -- that will provide you and your team with more robust DevOps best practices.

Moreover, continuous testing practices also need to scale with your software application itself, by keeping you informed the moment something goes wrong in development, staging, or even production environments. That's why Airbrake's powerful error monitoring software guarantees that your team won't need to worry about losing track of production errors! Airbrake provides real-time error monitoring and automatic exception reporting for all your development projects. Airbrake's state of the art web dashboard ensures you receive round-the-clock status updates on your application's health and error rates. No matter what you're working on, Airbrake easily integrates with all the most popular languages and frameworks. Plus, Airbrake makes it easy to customize exception parameters, while giving you complete control of the active error filter system, so you only gather the errors that matter most.

Check out Airbrake's error monitoring software today and see for yourself why so many of the world's best engineering teams use Airbrake to revolutionize their exception handling practices!

Written By: Frances Banks