If it isn’t monitored it doesn’t exist. Monitoring and alerting are core operational components in almost any infrastructure. Finding the right balance between which data to collect and what to alert on is often a challenging and ongoing task. Workloads, traffic patterns, data and software evolve which can affect performance and availability. This can be even more challenging in an elastic environment where machines or containers are created and deleted automatically based on demand.
False alerts are disruptive and if persistent create apathy. We set ourselves the high goal of zero false alerts.
There are many open source and commercial monitoring solutions on the market. Which system, or combination, is the best solution varies based on individual requirements and environment. We commonly work with the following solutions:
- CloudWatch
- Grafana
- LibreNMS
- Netdata
- Prometheus
- Zabbix