The Shining Path of Least Resistance

LeastResistance.Net

Archive for the ‘zenoss’ Category

##MonitoringSucks Terminology: Zenoss Breakdown

Posted by mattray on August 5, 2011

Following up on the 07/21/11 ##monitoringsucks IRC discussion on terminology, I thought I’d break down Zenoss as an example of how I believe the terminology applies.

Primitives

  • metrics: This is the raw monitoring data. Zenoss supports a wide variety of collection techniques, and metrics are stored as “Data Points” in RRD.
  • context: Zenoss has “Thresholds” attached to the “Data Points” which trigger “Events”. Thresholds may be exceeding a value, a specific value, falling within (or outside) of a range or Holt Winters. The Event context contains the originating resource (device and IP), event state (new, acknowledged, suppressed), severity (0-5), event summary, specific details (message) and an event id.
  • resource: As the source of a metric, Zenoss has Devices that are the direct source of the metrics.
  • event: Map directly to Zenoss’ Events, with the context and actions part of the Event subsystem.
  • action: Zenoss has a fairly rich Event system, with a wide variety of possible ‘actions’ when an Event enters the system (whether by a Threshold or some other source). It may be dropped, deduplicated, transformed, sent to history, trigger event commands or generate alerts. Correlation may be done with transforms in Python.

Components

Model

Zenoss tries to create a model of all the monitored infrastructure.
Individual resources are presented as “Devices”, something with an IP address that may or may not be a map to a single node.
Devices are organized in a single “Device Class” which determines how they are modeled and how and what metrics are collected.
“Modeling” in Zenoss is the attempt to discover all the attributes of a device (network interfaces, filesystems, installed hardware and software, etc.).
Modeling is performed by “Modeling Plugins” (attached to Device Classes or individual devices) which may use a variety of protocols to discover what is on a Device (SNMP, SSH, WMI, etc.).
Device Classes have “Monitoring Templates” attached to them that define how and what to monitor.
Modeling Plugins and Monitoring Templates may be reused, overwritten and extended by Device Classes.
Zenoss may be configured to automatically discover the nodes on a network range or subnet and create a network map of all the devices.
Devices may be added to a single “Location”, which may be mapped and presented in the UI with a Google map.
Devices may also belong to multiple Groups and/or Systems (essentially 2 separate tag hierarchies).

Collection

Zenoss supports a wide variety of availability and performance monitoring, from both active and passive sources.
Most protocols map to a specific daemon, responsible for collecting the data and pushing it into the system to be stored in RRD files.
RRD has a variety of ways for storing data, but the metrics are represented numerically with a timestamp.
Out of the box Zenoss monitors

  • ICMP: ping (zenping)
  • JMX: performance monitoring (zenjmx via the zenjmx ZenPack)
  • TCP: port checks (zenstatus)
  • SNMP: performance, process-monitoring and receive traps (zenperfsnmp, zenprocess, zentrap)
  • SSH/Telnet: v1/v2 (zencommand)
  • Syslog: receive syslog messages (zensyslog)
  • WMI: Windows event log (zeneventlog)
  • Zenoss can reuse Nagios and Cacti plugins as well

There are quite a few community extensions (ZenPacks) providing additional collection features

Event Processing

As mentioned in the section on primitives, Zenoss has a Event system that handles context, events and actions.
Events may use their Devices, Device Classes, Locations, Systems and Groups for additional context.
Zenoss Events are stored in a MySQL data base.

Analytics

Correlation of events is done in the Event system, written in Python.
Graphing of metrics are available with RRD graphs and all the variations supported therein (single/multiple values, stacked graphs, multiple devices).
The Event Console makes it easy to quickly search and filter specific event values.
Example reports are included but writing custom reports is difficult because of the disparate storage mechanisms for metrics, events and configuration.

Presentation

Zenoss has a featureful UI with an emphasis on monitoring thousands of nodes at a time and rolling up events in the Event Console.
There is a configurable dashboard that has a number of configurable portlets that may be applied (reports, events, graphs, web sites, etc.) .
It is a webapp mostly using javascript (ExtJS) on top of the Python Zope application server.
Lightweight ACLs are available and multiple users are supported for

Configuration

The user interface for Zenoss is focused on making it easy to manage monitoring thousands of devices by configuring their Device Classes and applying Devices to them (as opposed to individual devices).
While configuration is primarily through the UI, there are tools for bulk-loading devices from files or scripting as well.
There is a command-line interactive interface to the object database (zendmd) that can be used to query and alter the monitored infrastructure.

Storage

Metrics are stored in RRD.
Events are stored in MySQL.
Configuration and relationships between objects are stored in the Zope Object Database (ZODB).

API

Zenoss has published JSON API for interacting remotely with examples in Python and Ruby (most of the UI uses these APIs).
There is also published Developer Documentation for extending and writing plugins.
The zendmd tool may be used to interact with Zenoss programatically as well via scripting.

Conclusions

Zenoss tries to provide a framework for monitoring thousands of machines that is flexible enough to contain network devices, servers and services. The terminology and taxonomy that emerged from IRC discussion fit fairly well, hopefully we can at least attempt to compare apples and apples when it comes to discussing different monitoring implementations.

It would probably be worthwhile to make a future post breaking down the strengths and weaknesses of Zenoss’ approach as well as which components would be easiest to reuse within other systems.

Advertisements

Posted in monitoring, monitoringsucks, zenoss | Tagged: , | Leave a Comment »

##MonitoringSucks Terminology (first stab)

Posted by mattray on July 12, 2011

Inspired by the recent ##monitoringsucks discussions, I thought I’d add my thoughts on creating a common set of terminology so we can start making progress.

There are a multitude of monitoring solutions out there, but most can be categorized and described with the following basic terminology and components:

Each of the major components could be a separate, single-purpose application. With consistent APIs and interchangeable implementations, best-of-breed solutions could arise. A catalog of monitoring tools could be cultivated and maybe monitoring wouldn’t suck as much.

Collection

This is the gathering of raw data that we care about for monitoring. There are 3 components to Collection:

Metrics

The data points that you want monitored. These can be OIDs, metrics, REST calls or whatever. They may be performance and/or availability, active and/or passive. This is the raw data.

Thresholds

Metrics have a range of legitimate values, thresholds are the limits on the legitimate values. These may be on individual or combinations of metrics.

Collecting

The actual process of gathering data varies depending on the metrics. There are a wide variety of monitoring protocols (SNMP, WMI, Syslog, JMX, etc.), we need to document how we collect the metrics.

Model

This is the representation of what you are collecting, a collection of metrics and thresholds. The Model is a collection of Nodes. A Node is typically a single machine, but may cover multiple of metrics from separate machines or services (think services and clusters) depending on the implementation. There may be no Model whatsoever (lists of metrics checks).

Events

Events are what happens when a threshold is violated. They may be suppressed, de-duplicated and possibly correlated with other events. There may be dependencies between Nodes or correlations with other Events, implementations may vary.

Alerting

Separate from Events, alerting is the means to notify people and systems that an Event requires attention. There are numerous mechanisms for alerting (email, paging, asterisk, log, etc.) and ideally the Alerting component has the concept of users, schedules and escalation rules.

Presentation

There are 2 pieces to the Presentation component:

UI

The Monitoring solution may or may not have/need a UI. This is visual representation of the Model, Events and possibly Alerts. There may be a Dashboard rolling up different views into the information captured by the monitoring solution.

Reporting

Ideally the data captured by the monitoring solution is available for whatever reporting you want to do. It may be in SQL databases, RRD or some other format but the ability to access the data and create new reports is essential.

Cross-cutting Concerns

API

Ideally every component should have published APIs for interacting with programatically and/or remotely. Without an API, monitoring tools become less and less relevant in the face of increasing automation.

Configuration

As with APIs, all monitoring framework components need to be easily automated by configuration tools.

Storage

Where metrics are stored. There are lots of choices, they should be accessible for reports and via an API.

Posted in monitoring, monitoringsucks, zenoss | Tagged: | Leave a Comment »

What Would You Say You Do Here?

Posted by mattray on May 25, 2008

I get asked that question fairly frequently, not by people who know Open Source software, but by people outside my realm of employment. “Community Manager for an Open Source systems management company” has gotten more than a few quizzical looks as they slowly back away. I tell people I encourage people to use our software, even if they don’t pay for it, which just creates more questions.

I’ve been at my new job for nearly 2 months and I’m just starting to feel like I’m getting my head around everything. As the Community Manager, my job is a weird hybrid between customer support, development and guerilla marketing. On any given day I can plan on working on something like reviewing documentation and assisting a community member with their ZenPack (a Zenoss extension); and end the day with a blog post, a dozen emails and several discussions about supporting another Open Source project. Some days I miss diving into a code-cocoon where the whole day disappears into a blur of writing software.

Keeping up with everything can be hard, I’ve recently started using the Getting Things Done methodology (a blog post about that soon) and I’ve found it really helps. The hardest thing is that I rarely feel I can focus on something for several days, I have too many spinning plates and have a hard time tuning everything else out. Hopefully with better prioritization and GTD I can fix that. I could also spend as much or as little time on any subject I come across. I could spend all day on IRC helping users, read documentation until I figure everything out, or learn Python as well as I’d like. But there is almost always something of higher priority bumping my schedule, so I’m keeping much busier than any of my last few jobs kept me.

This isn’t to complain though, I actually enjoy my job quite a bit. There’s constant variety so I’m never bored and I enjoy engaging most of the people I come across. Zenoss has a very passionate user-base, which is one of the things I’d noticed when I was evaluating the company. This makes my job a little easier, it feels good to work on a project that you feel proud about, as opposed to some random software that someone, somewhere is using (quite possibly not by choice). I really wanted to work for an Open Source company, or at least be in a position where I could contribute substantially to one, so I guess I’m doing pretty good.

So there you have it, hopefully the Bobs are satisfied.

Posted in career, community, sausage, zenoss | Tagged: | 1 Comment »

OSX IRC Client Shootout

Posted by mattray on May 2, 2008

My IRC needs are quite basic, I need an OSX IRC client I can leave open all day without having it crash or consume 100% of my CPU and/or memory. I’ll be hanging out on the #zenoss channel on the irc.freenode.net servers as part of my new job. Open Source is preferred, and I’d like configurable Growl integration and detailed, searchable logging; but stability is my #1 priority. Below is my 15-minute per client IRC shootout.

The Contestants:

Colloquy 2.1:
Colloquy was the first I tried because I remembered it seemed pretty good from a few years ago, but then my company started blocking IRC and I never got around to using it again. It is GPL and configuration seemed to go fine except for the fact that everytime the window lost focus, it started bouncing on the Dock. After I fixed that annoyance, after about 10 minutes of IRC, my CPU hit 100% and Colloquy was the culprit. That could be related to this ticket, but that was a major strike against it. After switching to my new MacBook Pro, I figured the PPC bug would be gone and it would be OK, but then channels would open and stay empty, so I decided I’d had enough.

Irssix .7:
Very minimal and GPL, I got online with no fuss. No themes, just black on white and it seemed to resist my attempts at applying different fonts. No Growl or offline logging either. Never noticed CPU or memory usage. It set a very stable, no-frills baseline.

Ircle 3.1.2:
30-Day shareware, interesting project because they still support OS 9 and the 68K platform. Kinda ugly and complicated out of the box, with an annoying sound theme on by default. irc.freenode.net wasn’t on the list of 2300+ servers as far as I could tell. Configuration was also complicated, didn’t see Growl integration, but auto-logging was available (and I assume searchable outside the application). Crashed when I shut it down (report sent).

Snak 5.3.3:
30-Day shareware, but the money is donated to charity so that’s a positive. The first run started with a setup assistant, which seemed innocuous enough and it worked immediately. Themes were mostly pleasing pastels and there was a nice transparency slider. Growl integration in the action list, where you could trigger highlighting or other actions based on input was a very slick feature. Memory and CPU usage seemed minimal. Logging sent to an external file with configuration for the formatting. Everything seemed stable and straightforward, no complaints in my 15 minutes.

X-Chat Aqua .16
I used to use XChat for Linux back in the day, this is the GPL OSX Aqua update. For eye-candy it had a transparency slider and extensive color support but no themes. Logging to external files is supported. The event notifications configuration is quite nice. You can choose Growl, indicate on or bounce the Dock or a sound file for just about everything IRC related with toggling for when XChat is the foreground application. I never noticed memory or CPU usage. Apparently it hasn’t been updated in awhile, but it seems to be working just fine. Occasionally OSX’s Spaces will forget to pin it to all desktops, but that’s just an odd bug for now.

And the winner is…

X-Chat Aqua hit all the right features and seems stable enough. If it turns into a resource hog after a few days, I’d probably give Snak another look. I’m sure I probably overlooked some other IRC clients or missed out on the greatness of one of the ones I did review. Feel free to leave feedback and maybe if X-Chat stops being good enough I’ll reevaluate the competition.

PDF Scribd link.

Posted in community, osx, zenoss | Tagged: | 3 Comments »

On my way to LinuxFest NW

Posted by mattray on April 24, 2008

Tomorrow I’ll be flying to sunny(?) Bellingham, WA for LinuxFest Northwest. It’s my first business trip for my new job. If you happen to be there, look for me at my Zenoss presentation or around the conference. LinuxFest NW Totem

Posted in community, zenoss | Tagged: | 2 Comments »

My New Gig

Posted by mattray on April 18, 2008

In case you hadn’t heard, I recently left my old job for a new job at Zenoss, an Open Source Enterprise Systems Management company. I am now the Community Manager, which is essentially a technical liaison between Zenoss and the Open Source developer community. My job is to help build, strengthen and support Zenoss’ community because they are the ones who use and help support, direct and write the software we provide as a company. I’ll be active in every community-facing aspect I can, and you’ll find me on the forums and mailing lists, IRC, blogs and email asking lots of questions and trying to find as many answers as I can.

While my last job was at one of “The Big 4” as a Java developer, I had a Linux consulting company on the side and I’ve been involved with the Open Source community in one form or another for over 10 years. I’ve worked in Systems Management, retail, distributed computing, banking, scientific and educational software over the years and I’ve been at several startups and founded a few myself. While I’m a fairly new to the Python community, I’ve coded in Java, Ruby, Perl, C, Lisp and many other languages professionally and for fun so I hope to get into the swing of things pretty fast.

I’ll be posting stuff here and at the Zenoss Blog from time to time on my experiences. Feel free to contact me with feedback, questions or answers:

  • Email: mray@zenoss.com
  • AIM: mrayzenoss
  • Twitter: mattray
  • IRC: #zenoss on irc.freenode.net as mrayzenoss

Posted in career, community, python, zenoss | Tagged: | 6 Comments »