Infrastructure-as-Code series: Practical monitoring in an IaC universe

The Computer Weekly Developer Network (CWDN) now starts its Infrastructure-as-Code (IaC) series of technical analysis discussions to uncover what this layer of the global IT fabric really means, how it integrates with the current push to orchestrate increasingly cloud-native systems more efficiently and what it means for software application development professionals now looking to take advantage of its core technology proposition.

This piece is written by Rachel Berry in her capacity as product consultant at EG Innovations, a global application & infrastructure monitoring tool company that focuses on every layer & tier of an IT environment with automated discovery, monitoring and root cause diagnosis etc.

A prolific software professional and research award winner, Berry’s own Virtually Visual pages are linked here.

Berry writes as follows

IaC and monitoring tools

Infrastructure-as-Code (IaC) is becoming ubiquitous for defining and automating the provisioning and deployment of IT infrastructure.

Higher-level languages and scripts are used to define the infrastructure applications run upon including networking, servers, data storage and so on. This is all taking place within the context that auto-scaling and auto-deployment is becoming normal too and IaC may also define how infrastructure should auto-scale up or down to service the needs of applications.

EG Enterprise works to cover unified APM (Application Performance Monitoring), infrastructure monitoring and cloud monitoring. As such we have been architecting our product to work within the new normal of IaC and overcoming challenges traditional monitoring tools will face.

A common customer scenario we encounter is the deployment of SaaS-based service Java apps deployed within containers on application servers such as Tomcat or WebLogic often running on public clouds such as Azure and AWS.

As demand fluctuates to maintain key metrics such as application response times, frameworks such as Kubernetes are leveraged to spin up new infrastructure e.g. additional servers to service increased demand. The performance and availability of the apps and services though is the business-critical need and monitoring what may be ephemeral and automatically deployed (or destroyed) infrastructure is essential.

There are typically many steps involved in monitoring infrastructure, traditionally a system administrator maintaining a static on-premises deployment will have performed tasks manually to gain monitoring insights via a console such as:

Installing agents to harvest metrics e.g. within a Windows OS to collect perfmon counters.
Configuring what has to be monitored on a server.
Setting the credentials needed for monitoring.
Making any changes needed in an app stack to support monitoring (e.g., for APM).
Setting and tuning thresholds if required to trigger alerts e.g. if a server’s CPU usage exceeds 90%.
Assigning the monitored component to the respective teams with dashboards and via ITSM integrations.
Removing components to be monitored e.g. when servers are decommissioned.
Unassigning the removed component for respective teams

When implementing IaC, the ultimate goal is full automation and the removal of manual intervention and effort. Ideally, everything would be set up automatically, including full and comprehensive monitoring functionality for the infrastructure. This means as a monitoring vendor we need to include features to enable monitoring.

The administrator or architect designing and deploying IaC will then need to leverage technologies from monitoring vendors or script their own functionality and consider requirements including:

No manual installation of agents should be necessary – a silent, automated way is needed to install agents

Either the IaC code should configure (for example) a server for monitoring or you must have auto-discovery and configuration enabled.
Some monitoring products built around AIOps engines will automatically baseline usage and set thresholds without the need for manual configuration automating the implementation of alerts for anomalous behavior and events on dynamically changing infrastructures
How corrective actions can be used to allocate/deallocate resources such as additional virtual machines based on different parameters like system resources consumption, TCP connections utilisations etc.

Configuring monitoring

If you need to configure monitoring on (for example) a server using code, you will need to use either a CLI or API supplied by the monitoring tool and implement functionality such as:

Credentials must be provided as defaults, so a specific configuration is not needed. If a specific config is preferred though an API/CLI should be provided for this.
Changes to app stacks must be possible through script configurations e.g. setting env variables in scripts and passing them to the app.

Assignments – either the monitoring tool should do this or you will need to be able to do this through code using the CLI/API.
Removing a component, the CLI/API should provide functionality to do this.

So, when evaluating monitoring tools, those implementing IaC need to consider whether they enable key requirements:

The silent install scripts for agents.
A supported CLI or API to enable the IaC code to perform the necessary steps automatically.
Whether auto-discovery is supported as this will minimise the amount of IaC code that needs to be written. Auto-discovery of this type is also needed for dynamic, auto-scaling environments so may be a requirement regardless of the IaC goals.

Public cloud vendors charge their users based on their utilisation. IaC may need to consume billing data from APIs/CLIs as input data for scaling and allocation decisions.

Berry: Magnificently masked to monitor, manage and migrate.

In some scenarios, agentless monitoring may be an option. In that scenario, you may not have to install an agent for each device, but you will have to map the device being added to existing agents. The logic to do this (who to contact and when) will then also need to be coded into the IaC scripting.

All of these requirements around monitoring also need to be implemented with security as a forethought and as a vendor we have to implement features with secure communication in mind especially around auto-discovery and agent communication.

IaC is undoubtedly the future, but beyond the automating, the creation and change of infrastructure, those enabling it must include functionality such as well-architected monitoring into their designs to ensure they have manageable infrastructure which they have insight into not only for troubleshooting but for also long-term capacity planning, security and to optimise usage to costs.