Community Metrics Wrangling with MLstats and OpenShift
As loyal readers of this blog (if any such creatures did exist) might have noticed, I’ve been working with the oVirt project, which got a reboot last year when Red Hat finished open sourcing and porting to Java the previously .Net-based management for its enterprise virtualization product.
Given the new start for oVirt, the project has been keen to get a handle on its community metrics, such as mailing list activity: is it growing, what’s the mix of people coming from companies on the oVirt board compared to other organizations and individuals, and so on.
While searching for suitable tools for mailing list analysis, I came across mlstats, a nifty application for slurping mailing list archives into a database, where they can be queried and made to cough up all sorts of interesting information. The application is really easy to use, and I had it up and running on my local machine, offering up pearls of oVirt mailing list wisdom, in no time at all.
I wanted to set the app up on a remote server somewhere, complete with a cron job for running the daily stats update, and with some means of displaying certain information from the db, such as daily traffic on each of the oVirt lists. I opted to deploy this mlstats-o-matic on OpenShift, the Red Hat Platform as a Service, uh, service, on which I run this blog. OpenShift instances support mysql and cron, and the service’s port forwarding feature is great for enabling database access via one’s favored desktop query tools. More on that later.
That worked fairly well for our initial oVirt needs, but since the team I work on at Red Hat is focused on working with a broad range of open source projects, I wanted to figure out how to make it easy to use this same framework with other projects. The queries and the web pages were hard coded to the particular lists I’d chosen, so my initial take wasn’t a very cleanly repeatable solution.
More hackery ensued, and I changed the setup around to use bash scripts to assemble the queries and html pages based on whatever list of mailing lists a user might select. I’m sure I could have done it all much more elegantly, but it’s all up at github where anyone is free to set me straight. :)
I set the whole thing up as an OpenShift quickstart, so if you’re interested in engaging in some mailing list analysis action of your own, head over to Github and check it out. I made a short screencast of the process to give you an idea of how easy it is to get up and running:
The cron job will go off and update the data each day, and there really isn’t any maintenance to do. If you want to get at the data with your desktop mysql query tools, the command “rhc-port-forward -a appname” will make the database at OpenShift accessible to you locally.
Please give it a shot, and let me know how it works for you!