Sensu is now in the top of the coolest projects I’ve worked in the recent past. It’s a recommended alternate to Nagios. We started monitoring our Openstack cloud infrastructure end to end using Sensu, at a crucial time when the org is moving towards achieving a highly available private cloud infrastructure to support a huge customer base.
Key points on Sensu:
- More scalable solution for monitoring the cloud infrastructure.
- Subscription based – a new infrastructure when comes up, identifies itself (like – ‘i am a webserver’) and gets tagged to checks relevant to that subscription. With an automation like Chef this eliminates need to install agents or define clients in server side. Easy to manage with a CMS!
- Supports plugins written in different languages. Easy to re-use plugins from nagios.
- Rabbitmq used for messaging, Redis noSql db for data storage.
- Light weight application written in Ruby.
- We had several availability issues with nagira api for nagios. Compared to that, I feel sensu api is performing much better.
- JSON format.
- Standalone checks -> sensu-client can run it’s own standalone checks and report results to the server.
Our architecture of Sensu involves these components:
- Sensu-Server, Rabbitmq, Redis and Uchiwa dashboard in a two node cluster (planning to expand to 4 across different data centers)
- High Availability -> Rabbitmq clustering, Redis Sentinel. Sensu Server will assign the master automatically.
- Redis deployed in 3 nodes with Sentinel. In an event where master redis goes down, it will automatically promote a slave to master and “business as usual”.
- Here is a quick reference to implementing HA for Sensu
- In our setup we use HAProxy in all the nodes and created a global vip to access the components.
- Integrated handlers to send alerts to HP-Operations manager for the NOC Team.
- Configured several metric checks to feed into Graphite (https://github.com/opower/sensu-metrics-relay)
- Ansible cookbook plugs in the Graphite URL for the host so that Uchiwa shows basic system metrics adjacent to the checks.