A quick note to anyone setting up a new Hadoop cluster and hoping to quickly use the built in Ganglia metrics collection (which you should! If it moves, graph it!): This works out of the box with Ganglia 3.0, but the protocol changed with Ganglia 3.1.
The official GangliaMetrics pages talks about this, and talks about patching (which is already available if you use the Cloudera releases) but doesn’t go into more detail than that. I recently set up a new cluster, and remembered there was something I had to change in the default config to make it work out of the box… After inquiring (and finding the comment I left in my old config file!) I remembered, you must change the default class to have “31” (e.g. Ganglia 3.1) on the end.
For example, the default config file: (Replacing @GANGLIA@ with your multicast address)
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext dfs.period=10 dfs.servers=@GANGLIA@:8649 mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext mapred.period=10 mapred.servers=@GANGLIA@:8649 jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext jvm.period=10 jvm.servers=@GANGLIA@:8649 rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext rpc.period=10 rpc.servers=@GANGLIA@:8649
Is changed to this:
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 dfs.period=10 dfs.servers=@GANGLIA@:8649 mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 mapred.period=10 mapred.servers=@GANGLIA@:8649 jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 jvm.period=10 jvm.servers=@GANGLIA@:8649 rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 rpc.period=10 rpc.servers=@GANGLIA@:8649
Restart the cluster, and the graphs will appear under each host in the Ganglia interface.
There is a LOT of detail in these graphs, with metrics ranging from DFS (things like bytes written, and how many operations were transferred from other nodes) to the JVM (monitor those heap memory sizes!)
This is probably old news to most people I’m sure, but I have a rule that if I didn’t find it within 30 minutes, maybe this will help someone in the same boat as me 🙂