Archive for the ‘Work’ Category.

Hadoop and Ganglia 3.1

A quick note to anyone setting up a new Hadoop cluster and hoping to quickly use the built in Ganglia metrics collection (which you should! If it moves, graph it!): This works out of the box with Ganglia 3.0, but the protocol changed with Ganglia 3.1.

The official GangliaMetrics pages talks about this, and talks about patching (which is already available if you use the Cloudera releases) but doesn’t go into more detail than that. I recently set up a new cluster, and remembered there was something I had to change in the default config to make it work out of the box… After inquiring (and finding the comment I left in my old config file!) I remembered, you must change the default class to have “31″ (e.g. Ganglia 3.1) on the end.

For example, the default config file: (Replacing @GANGLIA@ with your multicast address)

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext
dfs.period=10
dfs.servers=@GANGLIA@:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext
mapred.period=10
mapred.servers=@GANGLIA@:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.period=10
jvm.servers=@GANGLIA@:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext
rpc.period=10
rpc.servers=@GANGLIA@:8649

Is changed to this:

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=10
dfs.servers=@GANGLIA@:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=10
mapred.servers=@GANGLIA@:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=10
jvm.servers=@GANGLIA@:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
rpc.period=10
rpc.servers=@GANGLIA@:8649

Restart the cluster, and the graphs will appear under each host in the Ganglia interface.

There is a LOT of detail in these graphs, with metrics ranging from DFS (things like bytes written, and how many operations were transferred from other nodes) to the JVM (monitor those heap memory sizes!)

This is probably old news to most people I’m sure, but I have a rule that if I didn’t find it within 30 minutes, maybe this will help someone in the same boat as me :)

Naglite2 finally released

It’s been a long time coming (even longer than CactiView!) but finally I’ve cleaned up (as much as possible) and released Naglite2, a full screen easy to read status screen backed on to Nagios.

Naglite2

Perfect for a NOC or operations room, you get a at-a-glance view of your hosts and services status, which not only helps in sudden emergencies but also incentivise  your staff to get a “clean board” and fix the remaining niggly problems in your network!

The screen also compresses down quite nicely into a mobile browser, perfect for checking on the status of your systems whilst on the move.

The code is up over at Github, feel free to use/distribute/fork and modify or send me comments.

Get Naglite2 now

CactiView

It’s been a while coming and I apologise to those who have been waiting but finally I have publicly released CactiView.

CactiView

All the details are in the README inside the tar.gz, but here is a quick description for those who do not know:

CactiView gives you a clean and simple view of one graph from Cacti at a time. You can
name the graphs, and set the automatic rotation duration.

The display includes one main large graph for the last 12 hours, 3 smaller graphs with longer time periods and a couple of other bits and bobs of information.

Please let me know what you think.

CactiView is available for download here: http://denness.net/downloads/cactiview-0.1.tar.gz

Or on Github here: http://github.com/lozzd/CactiView/

Setting up a DRAC card using Debian

Today I was faced with the problem of setting the IP address of a DRAC (dedicated Dell Remote Access Card, which are super by the way, and a lot lot quicker than Sun’s effort) in a Dell server that was powered on, running something production on the Debian OS, and I had no physical access to the server, so no rebooting for configuration was possible.

Now, if you have an idea of what IP address is on that card already you can talk to it remotely which isn’t a problem. The problem was, I had no idea what the IP address was currently set it to and it wasn’t DHCP. Even so, I had no copy of the racadm command, the Dell tool to control the card. (omconfig is available on Debian now which is nice, but omconfig bmc is a deprecated command and indicates to use racadm!)

Let me tell you how to set the IP address with just a simple install of Debian and little effort. (I’m sure this on the internet somewhere but I had difficulties finding it. I expect my Google-fu was weak today.)

Install IPMItool from apt:

apt-get install ipmitool

Load the IPMI driver into /dev/ so we can talk to the card:

/usr/share/ipmitool/ipmi.init.basic

You can now print the current config of the card:

ipmitool lan print 1

Set the new IP address up, if you want to configure it manually:

ipmitool lan set 1 ipaddr 172.0.0.10
ipmitool lan set 1 netmask 255.255.255.0
ipmitool lan set 1 defgw ipaddr 172.0.0.1
ipmitool lan set 1 ipsrc static

Or set it to DHCP if you want:

ipmitool lan set 1 ipsrc dhcp

Check your settings:

ipmitool lan print 1

Reboot the DRAC; You may not have to do this, I did (and/or I’m impatient)

ipmitool mc reset cold

Within a minute the card should be up and responding to ping. Hurrah!

Note: I tried these on a DRAC4 card, and whilst it looked like it was accepting my instructions, it seems it was infact completely ignoring me. I had to configure this one manually in the BIOS. These commands work fine on a DRAC5 though.

Finding a Web Browser for constant page reloading

One of the things I have done whilst working at Last.fm is create a simple system whereby critical monitoring is displayed on screens that we have hanging from the ceiling. There is one in each corner of the room, and opposite monitors display the same thing (e.g. two monitors display our key Cacti graphs, and two display Nagios monitoring output, so everyone in the room can see it). This is achieved through a simple dual output graphics card, and a couple of two-way monitor splitters (and a lot of cable!)

The software itself is simple: The data is displaying using some PHP scripts written by myself specifically for output on these 22″ screens, and are hosted on our servers, so all that is required to display them is a web browser.You can see these two pages in action here (Naglite2) and here (CactiView)

Very simple, or so you would think. The problem is, with the nature of this data, it needs to be refreshed constantly. The graphs are in a rotation controlled by a Javascript frame that changes to a new URL every 20 seconds, and the services/host up/down notification screen updates with a meta refresh every 5 seconds. Again, sounds pretty simple. Here are my findings:

Initial Configuration – Ubuntu Linux with Firefox 3

Being my browser of choice anyway, I set everything up in Firefox to start with. We figured Linux desktop would be more stable for hosting this rather than Windows. F11 to fullscreen mode on both the monitors, and off it goes. We didn’t notice it too much at the time, but it’s pretty annoying the way it deals with the refreshing of the images.. It clears the page, and loads the images one by one, leading to a noticable flashing of the screen every time it reloads the page. Not only that, it was the worst browser we used, leading to 90% RAM usage (on a 2gb machine) after just a day. At this point, not only did it become very sluggish, but it would stop displaying the graphs randomly, and eventually ending up in severe corruption of all the images, mixing them together in an interesting fashion. Connecting via VNC every day and restarting Firefox became a bit of a chore, so we decided to give up and try something else.

Second configuration – Ubuntu Linux with Opera

Straight away Opera was performing much better than Firefox. It seemed to almost pre-load the images for the next set of graphs before it refreshed the page, leading to no flickring of the screen, just seamless re-loading of the page. It also managed a week before showing any signs of slowing down, but after that point the graphs started disappearing again. Opera had suffered the same fate as Firefox… Using all the memory available on the machine.

We also had another little problem.. We have the time printed in the bottom right of the screen (as text rather than an image) and even by forcing cache control headers, Opera was caching the pages. The clock would move between 5-10 minutes as each graph appeared. I discovered that Opera has some advanced preferences that lets you disable the cache completely. Whilst this fixed the problem with the clock, it meant that it then only survived 2-3 days before exhausting the memory usage. We put up with this for a number of months, before deciding to move on.

Hello Webkit

At this point, Russ and I thought it was about time we gave a Webkit based browser a shot. Konquerer seemed a good choice.. We installed kubuntu-desktop, and got Konquerer running, but had trouble getting it in a proper full screen mode. Eventually we managed to hide the tab bar, but the status bar was still there. Although we found some hacks to remove it, we wanted to try something in particular, which ended up with a radical change…

Current configuration – Windows XP and Google Chrome

We really wanted to give Google Chrome (Chromium) a go on Linux, but unfortunately it’s not quite at it’s prime yet… More than anything, we couldn’t get the pages to load at all because the HTTP Auth dialog has yet to be coded. (it simply doesn’t appear. As a side note, using the user:password@ url notation makes it crash!)

After a quick hour of installation, drivers and updates, we had the screens back up and running with XP and Chromium. The nice points so far have been:

  • Turning the two different pages we use into their own Apps using the Google Gears “Create application shortcut” menu option. Now we have a single icon to click to open one window, and another for the other.
  • Separate processes – Now we can monitor which tab is using the RAM, and just restartthe offending process if it becomes a problem
  • The biggest win by far – It leaks very little memory. So far after using it for a week, the process running the text only Nagios view has not used any more RAM than it did when we started it (35mb). The Cacti graphs screen, reloading graphs 24/7 for a week every 20 seconds has used just 80mb (40mb when it started). The reason for this is obvious; if you watch the usage, it loads the page, the memory increases by 5mb. After a few secnods, it drops by 5mb again. So there is a small memory leak somewhere but it seems Chrome is cleaning up after itself almost immediately, something which the other 2 browsers failed miserably at.

The overall functionality of the system is much the same.. I have compiled a couple of exe’s so that one switches off the displays and one turns them back on again (This combined with Task Scheduler means we save the planet whilst we’re not at work!) and VNC server functions actually better on Windows than on Linux (for some reason the secondary monitor displayed as a black screen on Linux, so you could control but not see it).

Downsides

The only downside of the Google Chrome based solution is: Webkit doesn’t support “text-decoration: blink”! In the image linked above, you can see we use the text CRITICAL for a service that is broken, and DOWN for a host that is having an issue. These used to blink, which was a nice touch to draw your eye to the issue. This is about the only valid use of “text-decoration: blink” I can think of, but unfortunately the webkit developers have chosen not to support it. Any support on this ticket would be appreciated!

We’re currently using the bleeding edge dev version, simply because it was the only version that had F11 Full screen mode in. This works very well, and it’s also very stable for a bleeding edge release (although obviously we aren’t using it like a regular browser).

Fin

If you’re after a browser that can handle sitting there all day and night happily refreshing a page, and you don’t mind running Windows (for now, anyway) then it seems Google Chrome may be your best bet. I will continue to evaluate it’s performance and maybe one day we can find something even better.

Any comments are welcome and we’re still open to suggestions, although I’m pretty happy I won’t have to restart Chrome for a few months if this trend continues!

Last.fm Beta – Yay!

Current Mood:Happy emoticon Happy

So we launched a super exciting new beta today.. It’s not very finished, and it’s going to get a lot better, but I’m very excited and here are some quick reasons why.

  1. Activity feeds. For me, I can remember what I shouted, who i added as my friend, what forum posts I made, and more. For my friends, the same, and I can see what’s going on with them. For any other resource: interesting stuff that people have done. Simple but so effectively because its live.
  2. Live updating charts. This makes me happy, because the charts look more like Audioscrobbler ones, and not only that they update every single damn play. Yay!  Every single play means something new to look at!
  3. Notifications. Easy way to see shoutbox posts and other stuff, other than checking my email.
  4. On the fly recommendations. Again, live updating goodness. No need to explain that!
  5. Library. Big, shiny, pretty view of everything you ever played. And finally you can delete that stuff you thought “jesus, why did I play that”? Apparently I listened to 50 cent! I never realised!
  6. Loved Tracks. They’re finally useful! Remember those tracks you loved but you never remembered because we didn’t have them streamable.
  7. The design. I wasn’t sure about it at first, but I think it’s looking pretty nice. Much more up to date than it was before, but it’s got a little way to go.

There are tonnes more awesome stuff going on, and more stuff to be tweaked, improved on, and cool stuff to be added, we’re not done yet! But it’s 11:30 and I’m not exactly sober. Goodnight!

IRC and BES and You

Current Mood:Cool emoticon Cool

I got this wonderful Blackberry device courtesy of work, since I’m on call and people want emails answering quickly etc, etc.

The miracle of BIM and Google Talk is fantastic.. lots of ways to talk to my fellow operations coworkers, but there was something missing. We use good old IRC at Last.fm to communicate, so when something goes a bit wrong its nice to be able to jump in and see what’s gone on (or whether no one is fixing anything and its up to you..!)

On a first search there was plenty of good IRC clients around. Unfortunately I couldn’t get any to work… They just said disconnected from server. Using MidpSSH I telnet’d to the server and got a connection refused.. Then I changed the connection method to “TCP” and it worked fine. Great! But no such option exists in any IRC client (Mobilirc is the best one at the moment it seems).

So, the BES won’t forward the traffic, the BES isn’t even managed by us, and both apps are open source. Let’s delve into the code!

else if ( spec.blackberryConnType == SessionSpec.BLACKBERRY_CONN_TYPE_DEVICESIDE ) {
conn.append(“;deviceside=true”);

References to “deviceside”… basically it proxies via the BES, so that’s deviceside=false, which is the default if not specified. Funnily enough. Mobilirc doesn’t specifiy this, so I jump in and add the line, so it now looks like this:

connector = (StreamConnection) Connector.open(“socket://” + host + “:” + port + “;deviceside=true”, Connector.READ_WRITE);

After a couple of hours of trying to get the Blackberry Development Environment working for me, I managed to get a .jar, .jad, .alx, .cod, and using javaload, got it on my device and SUCCESS! IRC running, backgrounded, highlights, always on. Hurrah!

I don’t know if this affects anyone, or if anyone else really cares, but if you do, let me know and I’ll send you the stuff. At least we’re happy now ;) and I’m happy that I still vaguely understand Java! :D

Get Adobe Flash player