DevOps Tools Introduction #12: IT Operations and Monitoring

While we have talked a lot about application deployment, we still need to cover how to keep these applications up and running. The DevOps Tools Engineer exam covers IT Operations and Monitoring in objective 705.1.

The core technology of this objective is the monitoring tool Prometheus. Its documentaion explains how to install Prometheus, including using a Docker image. Once Prometheus is running, follow the Getting Started guide which shows to configure Prometheus to monitor itself.

Monitoring other systems and applications requires exporters which collect monitoring information for Prometheus. The Node exporter produces status information of a Linux system available for monitoring. To collect metrics which are not constantly available, the Prometheus Pushgateway can cache information until it is collected. There are numerous other exporters for various purposes. Take a look to the Exporters documentation and explore exporters you might find useful for your infrastructure.

While collecting metrics is necessary, it is not sufficient for building a reliable monitoring solution. In addition, alerts have to be sent upon certain events. Prometheus’ Alertmanager handles alerting and provides a flexible configuration which defines the exact circumstances when alerts are sent. Besides alerting, reporting and visualization tools are another way to access monitoring data. Prometheus integrates with Grafana, an advanced tool to analyze and visualize data. Once you gathered some log data, take a look to the Grafana documentation and build some Dashboards on your own. If you need some inspiration, search for images of other people’s Dashboards to see what they get out of their monitoring data.

Deciding which metrics to monitor is crucial for a useful monitoring. Anita Buehrle’s article RED Method for Prometheus discusses handling such metrics. SmartBear’s article Understanding Performance Metrics for Monitoring goes into even more detail. Philipp Winder wrote a great Introduction to Monitoring Microservices with Prometheus and provides some samples how to implement microservice monitoring in Go and Java. Even though language-specific implementations are not tested in the DevOps Tools Engineer exam, the example are useful to understand how to make applications easily monitorable.

One of the biggest threats of the availability of service are security vulnerabilities. We have already mentioned the Open Web Application Security Project (OWASP) in an earlier posting. Today we will revisit security to learn how brute force, buffer overflows and denial of service attacks work. Browse OWASP’s attack category to get an overview of other threats beyond than those mentioned in the exam objectives.

And don’t forget to think about how to prevent such attacks. The paper Security threats and their mitigation in infrastructure as a service gives a good overview. The exam objectives explicitly mention firewall types, which are well explained in Kim Crawley’ articles about how firewalls work.

While clouds, VMs and containers are easy interfaces to computing resources, physical IT resources are the base for any application. Dejan Lukan explains the relationship between virtualization and cloud computing. Leave some time to reflect upon this relationship. Go through the steps in the deployment of your applications and ask yourself which new VMs and containers are created and which configuration changes are made to load balancers and other infrastructure components. Don’t focus too much on the individual components, focus on their role in the deployment.

You might recall a lot of the security and infrastructure related terms from earlier postings in this series. Here we see how development and operations, how applications and infrastructure interact. Next week we will discuss the last missing piece in this technology stack and talk about log management.

Previous post | Next post

About Fabian Thorns:

Fabian Thorns is the Director of Product Development at Linux Professional Institute, LPI. He is M.Sc. Business Information Systems, a regular speaker at open source events and the author of numerous articles and books. Fabian has been part of the exam development team since 2010. Connect with him on LinkedIn, XING or via email (fthorns at www.lpi.org).

Leave a Reply

Your email address will not be published. Required fields are marked *