Jun
15
2008

Troubleshooting Hyperic HQ with HQ Health

I’ve posted the translation of this article on HyperForge and I think this is also worthful to post it on my blog.

Starting with version 3.2 Hyperic HQ offers a new function called HQ Health. You can find the function in the Administration Menu Administration -> Plugins -> HQ Health

HQ Health offers a lot of metrics like memory usage of the server, information about caching params and load and a complete list of all HQ Agents connected to this particular HQ Server.
If you experience connection problems between HQ Server and Agent or if a platform shows up as unavailable in HQ when it’s definetly available HQ Health is the first tool I recommend you to debug the problem.
HQ Health offers the following informations about connected Agents:

FQDN

This is the identifier of the platform within HQ Server

Address
IP-Address for communication from Server to Agent

Port

Port on Agent side for communication from Server to Agent. Default port is 2144. You’ve to allow connections from HQ Server to the Address and the Port. Check this by run telnet <Address> <Port>  is a good idea. This is an example for a proper connect:

mpluhar@biollante:/tmp$ telnet 192.168.1.114 2144
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^\]'.
GET

Connection closed by foreign host.

If you get a timeout or no connection, check your  HQ Agent configuration and your firewall and network configuration.

Version

Shows the version number of your HQ Agent. Hyperic HQ is backward compatible. So you may run Hyperic HQ Server 3.2.x with 3.1.x Agents, but I strongly encourage you to run the latest version. If you have any problems with Hyperic HQ upgrade Server and Agents to the latest version and check the version of your Agents with HQ Health.

Creation Time

Creation Time shows you the date when you made the initial setup the platform within Hyperic HQ Server

#Platforms

The number of platforms this Agent is collecting data for. For example an agentless device like an SNMP-device is a platform, too.

#Metrics

This is the number of metrics which an Agent collects. Try to balance the number of metrics within different HQ Agents and try to avoid to make a single Agent collect  thousand metrics for all SNMP devices in your network. You create a single point of failure and if the Agent plaform fails, you’re in trouble monitoring the child devices. Furthermore if the host ist not a dedicated monitoring host, you maybe downgrade the performance of other services you’re running on the host.

Time Offset

Time Offset shows the offset in ms between HQ Server and Agent. Time synchronisation on HQ Server and HQ Agents is very important to determine the availability of platforms and services correctly. If you see huge values here, you’re in trouble. Try to setup NTP-daemons on your Server and Agent hosts. Of course, you can monitor your NTP-daemon with Hyperic HQ and fire up an alarm if the offset becomes too big. Single or double digit values are okay. If you see a question mark, your HQ Agent seems to be inactive
Currently with Hyperic HQ 3.2.x it’s not possible to run HQ Agents and Server in different time zones.

Written by mirko in: Hyperic |

1 Comment »

RSS feed for comments on this post. TrackBack URL


Leave a Reply

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com