
Gestern ist endlich das OpenNMS Buch angekommen. Etwas verwunderlich, da wir eigentlich auf November vertröstet wurden, es jetzt aber dann doch früher geliefert werden konnte. Liest sich bisher echt gut und gleich am Anfang musste ich grinsen:
OpenNMS konkurriert nicht mit Software zur Überwachung weniger Server.
Man kann mit Nagios schnell eine kleine Anzahl von Servern sehr spezifisch überwachen. Mit mehr oder weniger komplexen Perl Skripten können sehr stark angepasste Anfragen geschrieben werden - allerdings um den Preis einer weniger guten Skalierbarkeit und eines höheren Ressourcenverbrauchs.
At some point in time we experience an explosive growth of some of our MSSQL 2005 Express databases. It is a bug which is known:
http://blogs.msdn.com/sqlblog/archive/2006/09/18/my-database-file-is-mar...
We installed a newer version which should have fixed the bug (so said the release note by Microsoft).
But unfortunately it didn't. So we decided to monitor it. The script I show you in a second checks if the database 'autogrowth' value is set to a percentage value instead of a fixed one. And if yes, whether it is set to a value greater than 5. If you don't use the Express variant, you have to adjust the script accordingly
You need:
* IronPython 2.6
* Net-SNMP for Windows ( I run it on port 1161. So I can use the Windows SNMP also)
# snmpd.conf extend dbautogrowth cmd /c "C:\Program Files\IronPython 2.6\ipy.exe" e:\monitoring_scripts\check_dbautogrowth.py
# E:\monitoring_scripts\check_dbautogrowth.py # Configuration # DBNAME ist the one you see in Management Studio NOT the actual filename DBNAME="YourDataBase" # Don't change anything below this line import sys import clr INFOSTRING="DBAutogrowth_" + DBNAME sys.path.append(r"C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE") clr.AddReferenceToFile('Microsoft.SqlServer.Express.Smo.dll') clr.AddReferenceToFile('Microsoft.SqlServer.Express.ConnectionInfo.dll') from Microsoft.SqlServer.Management.Smo import * from Microsoft.SqlServer.Management.Common import * import Microsoft.SqlServer.Management.Smo as SMO import Microsoft.SqlServer.Management.Common as Common clr.AddReference("System") from System import * def main(): server = Server("localhost\SQLEXPRESS") db = server.Databases[DBNAME] growth_raw = db.FileGroups["PRIMARY"].Files[0].Growth growth_type_raw = db.FileGroups["PRIMARY"].Files[0].GrowthType growth = int(growth_raw) # Casting 'float' type to 'int' type growth_type = str(growth_type_raw).lower() # Casting 'FileGroup' type to 'str' type if growth_type == 'percent' and growth > 5: #Printing a String for the SNMP client to have an idea what this value stands for print INFOSTRING sys.exit(2) else: #Printing a String for the SNMP client to have an idea what this value stands for print INFOSTRING sys.exit(0) if __name__ == "__main__": main()
<!-- /etc/opennms/poller-configuration.xml --> ... <package name="example1"> ... <service name="MSSQL-DB-Autogrowth" interval="300000" user-defined="false" status="on"> <parameter key="retry" value="2"/> <parameter key="timeout" value="3000"/> <parameter key="port" value="1161"/> <parameter key="oid" value=".1.3.6.1.4.1.8072.1.3.2.3.1.4.12.100.98.97.117.116.111.103.114.111.119.116.104"/> <parameter key="operator" value="="/> <parameter key="operand" value="0"/> <parameter key="reason-template" value="The value of automatic database growth changed \ from a fixed value to a percent value greater than 5.\ Expected value should be 50 MB fixed or lower"/> </service> ... </package> ... <monitor service="MSSQL-DB-Autogrowth" class-name="org.opennms.netmgt.poller.monitors.SnmpMonitor"/> ...
Die OpenNMS Konferenz war wirklich gut und nun hab ich Tarus Balog auch mal persönlich getroffen. Alleine schon nach seiner Keynote wurde einem wieder bewusst, dass man mit OpenNMS auf's richtige Pferd gesetzt hat. Hier mal nen Braindump, was ich von der Konferenz für mich mitgenommen habe:
Es ist soweit! Endlich hat sich mal jemand hingesetzt und ein Buch geschrieben. Auch wenn das Wiki sehr gut ist, fehlte einem doch oft der rote Faden:
Neben einigen Firmen war auch jemand von Fraunhofer da, einer vom holländischen Verteidigungsministerium und sogar Netways. Die kucken wohl auch über den Tellerrand, auch wenn sie hinter Nagios/Icinga stehen.
Aus dem Vortrag wie man OpenNMS hoch verfügbar machen kann, hab ich doch nicht so viel mitgenommen, wie ich erhofft hatte. Das lag aber nicht am Vortrag - der war wirklich gut - sondern eher daran, dass ich vieles davon schon kannte.
Überrascht hat mich eher der Vortrag von Fraunhofer. Kürzlich hatte ich mir nämlich überlegt, ob es vielleicht nicht sinnvoll wäre, einen Net-SNMP Subagent zu schreiben, der verschiedene Werte bzw. Stati managed und via OIDs zu Verfügung stellt. Also so ähnlich, wie das z.B. MySQL-SNMP macht. Tja, bei Fraunhofer hat man das schon gemacht. So wie ich das verstanden habe, setzen sie es hauptsächlich als Remote Poller/Monitor ein. Es wurde dann angefragt, ob sie das als Freie Software veröffentlichen würden und scheinbar will man sich das auch überlegen...
Ach ja, den Subagent könnte man natürlich in Python schreiben :-)
Wirklich interessant war der Vortrag, wie man OpenNMS erweitern kann, ohne unbedingt den Code anfassen zu müssen. Dafür gibt es dann den BSFMonitor. Klar, es gibt schon ewig den GeneralPurposePoller, aber der ist ja eher als Notlösung gedacht, da Skripte durch Fork-exec nicht wirklich skalieren und das ist nunmal ein Fokus von OpenNMS. Der BSFMonitor soll da Abhilfe schaffen, da eigene Skripte alle innerhalb der JVM laufen. Vorgestellt wurden Beispiele mit Groovy Skripten, aber angeblich soll man auch andere Sprachen wie z.B. Python verwenden können. Aber ich wollte mir eh schon lange mal Groovy anschauen :-)
Hi,
I tried using the existing Net-SNMP Module but had no immediate success. To use it under 64-Bit systems, you have to change the Makefile. But even after that I only got zeros when walking the snmp tree. So then, I decided going for Python :-)
This is how it will look like:

First, our Python script getting the values. Deploy it on your load-balancer running IPVS and make it executable with chmod +x /opt/ipvs_stats.py.
/opt/ipvs_stats.py
#!/usr/bin/env python import sys filename = "/proc/net/ip_vs_stats" try: f = open(filename,'r') except IOError: print "Sorry, could not read file " + "'" + filename + "'" sys.exit() data = f.read() def hex2dec(s): """return the integer value of a hexadecimal string s""" return int(s, 16) # first create a list of lists data_list = [line.split() for line in data.split('\n')] stats = {} stats['total_conns_sec'] = hex2dec(data_list[5][0]) stats['incoming_pkts_sec'] = hex2dec(data_list[5][1]) stats['outgoing_pkts_sec'] = hex2dec(data_list[5][2]) stats['incoming_bytes_sec'] = hex2dec(data_list[5][3]) stats['outgoing_bytes_sec'] = hex2dec(data_list[5][4]) if __name__ == '__main__': if len(sys.argv) < 2: print "\nError:\tNo arguments given.\n" print "Try:" for argument in stats: print "\t", sys.argv[0], argument sys.exit() if sys.argv[1] == 'total_conns_sec': print stats['total_conns_sec'] if sys.argv[1] == 'incoming_pkts_sec': print stats['incoming_pkts_sec'] if sys.argv[1] == 'outgoing_pkts_sec': print stats['outgoing_pkts_sec'] if sys.argv[1] == 'incoming_bytes_sec': print stats['incoming_bytes_sec'] if sys.argv[1] == 'outgoing_bytes_sec': print stats['outgoing_bytes_sec']
The script reads the values from proc and as they are hexadecimal, we have a function for converting them to integers. The script can get you 5 values but in this HowTo I left out the Outgoing values because I don't use LVS-NAT in my setup. But you can simply add them from the examples here...
Next step is to prepare SNMPd with our script so we can get the stats via SNMP.
/etc/snmp/snmpd.conf
# Monitoring IPVS extend total_conns_sec /opt/ipvs_stats.py total_conns_sec extend incoming_bytes_sec /opt/ipvs_stats.py incoming_bytes_sec extend incoming_pkts_sec /opt/ipvs_stats.py incoming_pkts_sec
You can test it with
snmpwalk -v 2c -c <community> <IP> nsExtendOutline
NET-SNMP-EXTEND-MIB::nsExtendOutLine."total_conns_sec".1 = STRING: 3 NET-SNMP-EXTEND-MIB::nsExtendOutLine."incoming_pkts_sec".1 = STRING: 62 NET-SNMP-EXTEND-MIB::nsExtendOutLine."incoming_bytes_sec".1 = STRING: 31928
As you can see, I use nsExtendOutline instead of nsExtendResult. The reason is, that our values are surely greater than 127 and most systems require the exit value to be in the range 0-127, and produce undefined results otherwise.
Therefore in our script we don't use sys.exit() but print the values to STDOUT instead.
The OID is an ASCII representation of your chosen string after the "extend" command. To see it, use -On in your snmpwalk.
snmpwalk -On -v 2c -c <community> <IP> nsExtendOutline
.1.3.6.1.4.1.8072.1.3.2.4.1.2.15.116.111.116.97.108.95.99.111.110.110.115.95.115.101.99.1 = STRING: 4 .1.3.6.1.4.1.8072.1.3.2.4.1.2.17.105.110.99.111.109.105.110.103.95.112.107.116.115.95.115.101.99.1 = STRING: 74 .1.3.6.1.4.1.8072.1.3.2.4.1.2.18.105.110.99.111.109.105.110.103.95.98.121.116.101.115.95.115.101.99.1 = STRING: 33322
Now we can configure OpenNMS for collecting the data
/etc/opennms/datacollection-config.xml
...
...
...
<group name="ipvs" ifType="ignore">
<mibObj oid=".1.3.6.1.4.1.8072.1.3.2.4.1.2.15.116.111.116.97.108.95.99.111.110.110.115.95.115.101.99"
instance="1" alias="ipvsTotalConnsSec" type="octetstring" />
<mibObj oid=".1.3.6.1.4.1.8072.1.3.2.4.1.2.17.105.110.99.111.109.105.110.103.95.112.107.116.115.95.115.101.99"
instance="1" alias="ipvsPktsSecIn" type="octetstring" />
<mibObj oid=".1.3.6.1.4.1.8072.1.3.2.4.1.2.18.105.110.99.111.109.105.110.103.95.98.121.116.101.115.95.115.101.99"
instance="1" alias="ipvsBytesSecIn" type="octetstring" />
</group>
...
...
...
<systemDef name="Net-SNMP">
<sysoidMask>.1.3.6.1.4.1.8072.3.</sysoidMask>
<collect>
<includeGroup>mib2-host-resources-system</includeGroup>
<includeGroup>mib2-host-resources-memory</includeGroup>
<includeGroup>mib2-X-interfaces</includeGroup>
<includeGroup>net-snmp-disk</includeGroup>
<includeGroup>openmanage-coolingdevices</includeGroup>
<includeGroup>openmanage-temperatureprobe</includeGroup>
<includeGroup>openmanage-powerusage</includeGroup>
<includeGroup>ucd-loadavg</includeGroup>
<includeGroup>ucd-memory</includeGroup>
<includeGroup>ucd-sysstat</includeGroup>
<includeGroup>ucd-sysstat-raw</includeGroup>
<includeGroup>ucd-sysstat-raw-more</includeGroup>
<!-- <ipvs> -->
<includeGroup>ipvs</includeGroup>
<!-- </ipvs> -->
</collect>
</systemDef>
...
...
...Note the type of "octetstring". If you look at the type of this OID from the walk above, you’ll see it is "string". RRDtool and JRobin can’t store string data, thus it needs to be converted to a number. Setting the type to "octetstring" causes this to happen (it is converted to a gauge).
And finally we can build pretty graphs out from it:
/etc/opennms/snmp-graph.properties
... ... ... reports=...\ ..., \ ipvs, ipvs.incoming.bytes, ipvs.incoming.packets \ ... ... ... report.ipvs.name=IPVS Stats report.ipvs.columns=ipvsTotalConnsSec report.ipvs.type=nodeSnmp report.ipvs.command=--title="IPVS Stats" \ DEF:totalconns={rrd1}:ipvsTotalConnsSec:AVERAGE \ LINE2:totalconns#DE0056:"Total Connections/sec" \ GPRINT:totalconns:AVERAGE:"Avg \\: %10.2lf %s" \ report.ipvs.incoming.bytes.name=IPVS Stats Incoming Bytes report.ipvs.incoming.bytes.columns=ipvsBytesSecIn report.ipvs.incoming.bytes.type=nodeSnmp report.ipvs.incoming.bytes.command=--title="IPVS Stats - Incoming Bytes" \ DEF:bytes={rrd1}:ipvsBytesSecIn:AVERAGE \ LINE2:bytes#DE0056:"Bytes/sec" \ GPRINT:bytes:AVERAGE:"Avg \\: %10.2lf %s" \ report.ipvs.incoming.packets.name=IPVS Stats Incoming Packets report.ipvs.incoming.packets.columns=ipvsPktsSecIn report.ipvs.incoming.packets.type=nodeSnmp report.ipvs.incoming.packets.command=--title="IPVS Stats - Incoming Packets" \ DEF:packets={rrd1}:ipvsPktsSecIn:AVERAGE \ LINE2:packets#DE0056:"Packets/sec" \ GPRINT:packets:AVERAGE:"Avg \\: %10.2lf %s" \ ... ... ...
Python Port of a Perl script I uploaded to NagiosExchange last year.
The script should be run on the PRINCIPAL with a read-only user.
If you want to run it on the MIRROR, the user must have the Sysadmin role on it (ask Microsoft for the reason). Otherwise you get NULL.
You have to install the module pymssql manually if it's not shipped with your distro.
#!/usr/bin/python import optparse import pymssql import sys def main(): #Connect to MSSQL Server try: con = pymssql.connect(host=host, user=user, password=password, database=database) cur = con.cursor() except TypeError: print print "Could not connect to SQL Server" print sys.exit(1) # Execute Query which checks if database is mirrored query="""SELECT d.name, m.mirroring_role_desc, m.mirroring_state_desc FROM sys.database_mirroring m JOIN sys.databases d ON m.database_id = d.database_id WHERE mirroring_state_desc IS NOT NULL AND name = """ + "'" + database + "'" cur.execute(query) results = cur.fetchall() for row in results: name = row[0] role = row[1] state = row[2] exit_val = 2 if cur.rowcount > 0: if (role == "PRINCIPAL") and (state == "SYNCHRONIZED"): exit_val = 0 if exit_val == 0: print "OK", "-", name, "-", role, "-", state else: print "CRITICAL - Check the mirrored database" con.close() if __name__ == "__main__": # Command line Options parser = optparse.OptionParser() parser.add_option("-H", "--host", dest="host", metavar="HOST", help="IP or hostname with the mirrored database") parser.add_option("-d", "--database", dest="database", metavar="DB", help="Name of the mirrored database") parser.add_option("-u", "--user", dest="user", metavar="USER", help="User to login") parser.add_option("-p", "--password", dest="password", metavar="PW", help="Password of the user") if (len(sys.argv) < 2): args=["-h"] (options, args) = parser.parse_args(args) (options, args) = parser.parse_args() host = options.host user = options.user password = options.password database = options.database # Main function main()