Sensu Monitoring

Sensu is now in the top of the coolest projects I’ve worked in the recent past. It’s a recommended alternate to Nagios. We started monitoring our Openstack cloud infrastructure end to end using Sensu, at a crucial time when the org is moving towards achieving a highly available private cloud infrastructure to support a huge customer base.

Key points on Sensu:

  • More scalable solution for monitoring the cloud infrastructure.
  • Subscription based – a new infrastructure when comes up, identifies itself (like – ‘i am a webserver’) and gets tagged to checks relevant to that subscription. With an automation like Chef this eliminates need to install agents or define clients in server side. Easy to manage with a CMS!
  • Supports plugins written in different languages. Easy to re-use plugins from nagios.
  • Rabbitmq used for messaging, Redis noSql db for data storage.
  • Light weight application written in Ruby.
  • We had several availability issues with nagira api for nagios. Compared to that, I feel sensu api is performing much better.
  • JSON format.
  • Standalone checks -> sensu-client can run it’s own standalone checks and report results to the server.

Our architecture of Sensu involves these components:

  • Sensu-Server, Rabbitmq, Redis and Uchiwa dashboard in a two node cluster (planning to expand to 4 across different data centers)
  • High Availability -> Rabbitmq clustering, Redis Sentinel. Sensu Server will assign the master automatically.
    • Redis deployed in 3 nodes with Sentinel. In an event where master redis goes down, it will automatically promote a slave to master and “business as usual”.
    • Here is a quick reference to implementing HA for Sensu
    • In our setup we use HAProxy in all the nodes and created a global vip to access the components.
  • Integrated handlers to send alerts to HP-Operations manager for the NOC Team.
  • Configured several metric checks to feed into Graphite (https://github.com/opower/sensu-metrics-relay)
  • Ansible cookbook plugs in the Graphite URL for the host so that Uchiwa shows basic system metrics adjacent to the checks.
Advertisements
This entry was posted in Technology and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s