| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685 |
- <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN" >
- <book>
- <title>Nagios Plug-in Developer Guidelines</title>
- <bookinfo>
- <authorgroup>
- <author>
- <affiliation>
- <orgname>Nagios Plugins Development Team</orgname>
- </affiliation>
- </author>
- </authorgroup>
- <pubdate>2005</pubdate>
- <title>Nagios plug-in development guidelines</title>
-
- <revhistory>
- <revision>
- <revnumber>$Revision$</revnumber>
- <date>$Date$</date>
- </revision>
- </revhistory>
- <copyright>
- <year>2000 - 2005</year>
- <holder>Nagios Plugins Development Team</holder>
- </copyright>
- </bookinfo>
- <preface id="preface"><title>Preface</title>
- <para>The purpose of this guidelines is to provide a reference for
- the plug-in developers and encourage the standarization of the
- different kind of plug-ins: C, shell, perl, python, etc.</para>
- <para>Nagios Plug-in Development Guidelines Copyright (C) 2000-2005
- (Nagios Plugins Team)</para>
- <para>Permission is granted to make and distribute verbatim
- copies of this manual provided the copyright notice and this
- permission notice are preserved on all copies.</para>
- <para>The plugins themselves are copyrighted by their respective
- authors.</para>
- </preface>
- <article>
- <section id="DevRequirements"><title>Development platform requirements</title>
- <para>
- Nagios plugins are developed to the GNU standard, so any OS which is supported by GNU
- should run the plugins. While the requirements for compiling the Nagios plugins release
- is very small, to develop from CVS needs additional software to be installed. These are the
- minimum levels of software required:
- <literallayout>
- gnu make 3.79
- automake 1.8
- autoconf 2.58
- gettext 0.11.5
- </literallayout>
- To compile from CVS, after you have checked out the code, run:
- <literallayout>
- tools/setup
- ./configure
- make
- make install
- </literallayout>
- </para>
- </section>
- <section id="PlugOutput"><title>Plugin Output for Nagios</title>
-
- <para>You should always print something to STDOUT that tells if the
- service is working or why it is failing. Try to keep the output short -
- probably less that 80 characters. Remember that you ideally would like
- the entire output to appear in a pager message, which will get chopped
- off after a certain length.</para>
- <section><title>Print only one line of text</title>
- <para>Nagios will only grab the first line of text from STDOUT
- when it notifies contacts about potential problems. If you print
- multiple lines, you're out of luck. Remember, keep it short and
- to the point.</para>
- <para>Output should be in the format:</para>
- <literallayout>
- METRIC STATUS: Information text
- </literallayout>
- <para>However, note that this is not a requirement of the API, so you cannot depend on this
- being an accurate reflection of the status of the service - the status should always
- be determined by the return code.</para>
- </section>
- <section><title>Verbose output</title>
- <para>Use the -v flag for verbose output. You should allow multiple
- -v options for additional verbosity, up to a maximum of 3. The standard
- type of output should be:</para>
- <table id="verboselevels"><title>Verbose output levels</title>
- <tgroup cols="2">
- <thead>
- <row>
- <entry><para>Verbosity level</para></entry>
- <entry><para>Type of output</para></entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry align="center"><para>0</para></entry>
- <entry><para>Single line, minimal output. Summary</para></entry>
- </row>
- <row>
- <entry align="center"><para>1</para></entry>
- <entry><para>Single line, additional information (eg list processes that fail)</para></entry>
- </row>
- <row>
- <entry align="center"><para>2</para></entry>
- <entry><para>Multi line, configuration debug output (eg ps command used)</para></entry>
- </row>
- <row>
- <entry align="center"><para>3</para></entry>
- <entry><para>Lots of detail for plugin problem diagnosis</para></entry>
- </row>
- </tbody>
- </tgroup>
- </table>
- </section>
- <section><title>Screen Output</title>
- <para>The plug-in should print the diagnostic and just the
- synopsis part of the help message. A well written plugin would
- then have --help as a way to get the verbose help.</para>
- <para>Code and output should try to respect the 80x25 size of a
- crt (remember when fixing stuff in the server room!)</para>
- </section>
-
- <section><title>Return the proper status code</title>
- <para>See <xref linkend="ReturnCodes"> below
- for the numeric values of status codes and their
- description. Remember to return an UNKNOWN state if bogus or
- invalid command line arguments are supplied or it you are unable
- to check the service.</para>
- </section>
-
- <section><title>Plugin Return Codes</title>
- <para>The return codes below are based on the POSIX spec of returning
- a positive value. Netsaint prior to v0.0.7 supported non-POSIX
- compliant return code of "-1" for unknown. Nagios supports POSIX return
- codes by default.</para>
- <para>Note: Some plugins will on occasion print on STDOUT that an error
- occurred and error code is 138 or 255 or some such number. These
- are usually caused by plugins using system commands and having not
- enough checks to catch unexpected output. Developers should include a
- default catch-all for system command output that returns an UNKNOWN
- return code.</para>
-
- <table id="ReturnCodes"><title>Plugin Return Codes</title>
- <tgroup cols="3">
- <thead>
- <row>
- <entry><para>Numeric Value</para></entry>
- <entry><para>Service Status</para></entry>
- <entry><para>Status Description</para></entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry align="center"><para>0</para></entry>
- <entry valign="middle"><para>OK</para></entry>
- <entry><para>The plugin was able to check the service and it
- appeared to be functioning properly</para></entry>
- </row>
- <row>
- <entry align="center"><para>1</para></entry>
- <entry valign="middle"><para>Warning</para></entry>
- <entry><para>The plugin was able to check the service, but it
- appeared to be above some "warning" threshold or did not appear
- to be working properly</para></entry>
- </row>
- <row>
- <entry align="center"><para>2</para></entry>
- <entry valign="middle"><para>Critical</para></entry>
- <entry><para>The plugin detected that either the service was not
- running or it was above some "critical" threshold</para></entry>
- </row>
- <row>
- <entry align="center"><para>3</para></entry>
- <entry valign="middle"><para>Unknown</para></entry>
- <entry><para>Invalid command line arguments were supplied to the
- plugin or the plugin was unable to check the status of the given
- hosts/service</para></entry>
- </row>
- </tbody>
- </tgroup>
- </table>
-
- </section>
- <section id="thresholdformat"><title>Threshold range format</title>
- <para>Thresholds ranges define the warning and critical levels for plugins to
- alert on. The theory is that the plugin will do some sort of check which returns
- back a numerical value, or metric, which is then compared to the warning and
- critical thresholds.
- This is the generalised format for threshold ranges:</para>
- <literallayout>
- [@]start:end
- </literallayout>
-
- <para>Notes:</para>
- <orderedlist>
- <listitem><para>start > end</para>
- </listitem>
- <listitem><para>start and ":" is not required if start=0</para>
- </listitem>
- <listitem><para>if range is of format "start:" and end is not specified,
- assume end is infinity</para>
- </listitem>
- <listitem><para>to specify negative infinity, use "~"</para>
- </listitem>
- <listitem><para>alert is raised if metric is outside start and end range
- (inclusive of endpoints)</para>
- </listitem>
- <listitem><para>if range starts with "@", then alert if inside this range
- (inclusive of endpoints)</para>
- </listitem>
- </orderedlist>
-
- <para>Note: Not all plugins are coded to expect ranges in this format. It is
- planned for a future release to
- provide standard libraries to parse and compare metrics against ranges. There
- will also be some work in providing multiple metrics.</para>
- </section>
- <section><title>Performance data</title>
- <para>Performance data is defined by Nagios as "everything after the | of the plugin output" -
- please refer to Nagios documentation for information on capturing this data to logfiles.
- However, it is the responsibility of the plugin writer to ensure the
- performance data is in a "Nagios plugins" format.
- This is the expected format:</para>
- <literallayout>
- 'label'=value[UOM];[warn];[crit];[min];[max]
- </literallayout>
- <para>Notes:</para>
- <orderedlist>
- <listitem><para>space separated list of label/value pairs</para>
- </listitem>
- <listitem><para>label can contain any characters</para>
- </listitem>
- <listitem><para>the single quotes for the label are optional. Required if
- spaces, = or ' are in the label</para>
- </listitem>
- <listitem><para>label length is arbitrary, but ideally the first 19 characters
- are unique (due to a limitation in RRD). Be aware of a limitation in the
- amount of data that NRPE returns to Nagios</para>
- </listitem>
- <listitem><para>to specify a quote character, use two single quotes</para>
- </listitem>
- <listitem><para>warn, crit, min or max may be null (for example, if the threshold is
- not defined or min and max do not apply). Trailing unfilled semicolons can be
- dropped</para>
- </listitem>
- <listitem><para>min and max are not required if UOM=%</para>
- </listitem>
- <listitem><para>value, min and max in class [-0-9.]. Must all be the
- same UOM</para>
- </listitem>
- <listitem><para>warn and crit are in the range format (see
- <xref linkend="thresholdformat">). Must be the same UOM</para>
- </listitem>
- <listitem><para>UOM (unit of measurement) is one of:</para>
- <orderedlist>
- <listitem><para>no unit specified - assume a number (int or float)
- of things (eg, users, processes, load averages)</para>
- </listitem>
- <listitem><para>s - seconds (also us, ms)</para></listitem>
- <listitem><para>% - percentage</para></listitem>
- <listitem><para>B - bytes (also KB, MB, TB)</para></listitem>
- <listitem><para>c - a continous counter (such as bytes
- transmitted on an interface)</para></listitem>
- </orderedlist>
- </listitem>
- </orderedlist>
- <para>It is up to third party programs to convert the Nagios plugins
- performance data into graphs.</para>
- </section>
- <section><title>Translations</title>
- <para>If possible, use translation tools for all output. Currently, most of the core C plugins
- use gettext for translation. General guidelines are:</para>
- <orderedlist>
- <listitem><para>short help is not translated</para></listitem>
- <listitem><para>long help has options in English language, but text translated</para></listitem>
- <listitem><para>"Copyright" kept in English</para></listitem>
- <listitem><para>copyright holder names kept in original text</para></listitem>
- </orderedlist>
- </section>
- </section>
- <section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title>
- <section><title>Don't execute system commands without specifying their
- full path</title>
- <para>Don't use exec(), popen(), etc. to execute external
- commands without explicity using the full path of the external
- program.</para>
- <para>Doing otherwise makes the plugin vulnerable to hijacking
- by a trojan horse earlier in the search path. See the main
- plugin distribution for examples on how this is done.</para>
- </section>
- <section><title>Use spopen() if external commands must be executed</title>
- <para>If you have to execute external commands from within your
- plugin and you're writing it in C, use the spopen() function
- that Karl DeBisschop has written.</para>
- <para>The code for spopen() and spclose() is included with the
- core plugin distribution.</para>
- </section>
- <section><title>Don't make temp files unless absolutely required</title>
- <para>If temp files are needed, make sure that the plugin will
- fail cleanly if the file can't be written (e.g., too few file
- handles, out of disk space, incorrect permissions, etc.) and
- delete the temp file when processing is complete.</para>
- </section>
- <section><title>Don't be tricked into following symlinks</title>
- <para>If your plugin opens any files, take steps to ensure that
- you are not following a symlink to another location on the
- system.</para>
- </section>
- <section><title>Validate all input</title>
- <para>use routines in utils.c or utils.pm and write more as needed</para>
- </section>
- </section>
-
- <section id="PerlPlugin"><title>Perl Plugins</title>
- <para>Perl plugins are coded a little more defensively than other
- plugins because of embedded Perl. When configured as such, embedded
- Perl Nagios (ePN) requires stricter use of the some of Perl's features.
- This section outlines some of the steps needed to use ePN
- effectively.</para>
-
- <orderedlist>
-
- <listitem><para> Do not use BEGIN and END blocks since they will be called
- only once (when Nagios starts and shuts down) with Embedded Perl (ePN). In
- particular, do not use BEGIN blocks to initialize variables.</para>
- </listitem>
-
- <listitem><para>To use utils.pm, you need to provide a full path to the
- module in order for it to work.</para>
-
- <literallayout>
- e.g.
- use lib "/usr/local/nagios/libexec";
- use utils qw(...);
- </literallayout>
- </listitem>
- <listitem><para>Perl scripts should be called with "-w"</para>
- </listitem>
-
- <listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at
- least explicitly package names as in "$main::x" or predeclare every
- variable. </para>
-
- <para>Explicitly initialize each variable in use. Otherwise with
- caching enabled, the plugin will not be recompiled each time, and
- therefore Perl will not reinitialize all the variables. All old
- variable values will still be in effect.</para>
- </listitem>
-
- <listitem><para>Do not use >DATA< handles (these simply do not compile under ePN).</para>
- </listitem>
- <listitem><para>Do not use global variables in named subroutines. This is bad practise anyway, but with ePN the
- compiler will report an error "<global_var> will not stay shared ..". Values used by
- subroutines should be passed in the argument list.</para>
- </listitem>
- <listitem><para>If writing to a file (perhaps recording
- performance data) explicitly close close it. The plugin never
- calls <emphasis role="strong">exit</emphasis>; that is caught by
- p1.pl, so output streams are never closed.</para>
- </listitem>
-
- <listitem><para>As in <xref linkend="runtime"> all plugins need
- to monitor their runtime, specially if they are using network
- resources. Use of the <emphasis>alarm</emphasis> is recommended
- noting that some Perl modules (eg LWP) manage timers, so that an alarm
- set by a plugin using such a module is overwritten by the module.
- (workarounds are cunning (TM) or using the module timer)
- Plugins may import a default time out ($TIMEOUT) from utils.pm.
- </para>
- </listitem>
- <listitem><para>Perl plugins should import %ERRORS from utils.pm
- and then "exit $ERRORS{'OK'}" rather than "exit 0"
- </para>
- </listitem>
-
- </orderedlist>
-
- </section>
- <section id="runtime"><title>Runtime Timeouts</title>
- <para>Plugins have a very limited runtime - typically 10 sec.
- As a result, it is very important for plugins to maintain internal
- code to exit if runtime exceeds a threshold. </para>
- <para>All plugins should timeout gracefully, not just networking
- plugins. For instance, df may lock if you have automounted
- drives and your network fails - but on first glance, who'd think
- df could lock up like that. Plus, it should just be more error
- resistant to be able to time out rather than consume
- resources.</para>
-
- <section><title>Use DEFAULT_SOCKET_TIMEOUT</title>
- <para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para>
- </section>
-
- <section><title>Add alarms to network plugins</title>
- <para>If you write a plugin which communicates with another
- networked host, you should make sure to set an alarm() in your
- code that prevents the plugin from hanging due to abnormal
- socket closures, etc. Nagios takes steps to protect itself
- against unruly plugins that timeout, but any plugins you create
- should be well behaved on their own.</para>
- </section>
-
- </section>
- <section id="PlugOptions"><title>Plugin Options</title>
-
- <para>A well written plugin should have --help as a way to get
- verbose help. Code and output should try to respect the 80x25 size of a
- crt (remember when fixing stuff in the server room!)</para>
-
- <section><title>Option Processing</title>
- <para>For plugins written in C, we recommend the C standard
- getopt library for short options. Getopt_long is always available.
- </para>
- <para>For plugins written in Perl, we recommend Getopt::Long module.</para>
- <para>Positional arguments are strongly discouraged.</para>
- <para>There are a few reserved options that should not be used
- for other purposes:</para>
- <literallayout>
- -V version (--version)
- -h help (--help)
- -t timeout (--timeout)
- -w warning threshold (--warning)
- -c critical threshold (--critical)
- -H hostname (--hostname)
- -v verbose (--verbose)
- </literallayout>
- <para>In addition to the reserved options above, some other standard options are:</para>
- <literallayout>
- -C SNMP community (--community)
- -a authentication password (--authentication)
- -l login name (--logname)
- -p port or password (--port or --passwd/--password)monitors operational
- -u url or username (--url or --username)
- </literallayout>
-
- <para>Look at check_pgsql and check_procs to see how I currently
- think this can work. Standard options are:</para>
-
- <para>The option -V or --version should be present in all
- plugins. For C plugins it should result in a call to print_revision, a
- function in utils.c which takes two character arguments, the
- command name and the plugin revision.</para>
- <para>The -? option, or any other unparsable set of options,
- should print out a short usage statement. Character width should
- be 80 and less and no more that 23 lines should be printed (it
- should display cleanly on a dumb terminal in a server
- room).</para>
- <para>The option -h or --help should be present in all plugins.
- In C plugins, it should result in a call to print_help (or
- equivalent). The function print_help should call print_revision,
- then print_usage, then should provide detailed
- help. Help text should fit on an 80-character width display, but
- may run as many lines as needed.</para>
- <para>The option -v or --verbose should be present in all plugins.
- The user should be allowed to specify -v multiple times to increase
- the verbosity level, as described in <xref linkend="verboselevels">.</para>
- </section>
- <section>
- <title>Plugins with more than one type of threshold, or with
- threshold ranges</title>
- <para>Old style was to do things like -ct for critical time and
- -cv for critical value. That goes out the window with POSIX
- getopt. The allowable alternatives are:</para>
- <orderedlist>
- <listitem>
- <para>long options like -critical-time (or -ct and -cv, I
- suppose).</para>
- </listitem>
- <listitem>
- <para>repeated options like `check_load -w 10 -w 6 -w 4 -c
- 16 -c 10 -c 10`</para>
- </listitem>
- <listitem>
- <para>for brevity, the above can be expressed as `check_load
- -w 10,6,4 -c 16,10,10`</para>
- </listitem>
- <listitem>
- <para>ranges are expressed with colons as in `check_procs -C
- httpd -w 1:20 -c 1:30` which will warn above 20 instances,
- and critical at 0 and above 30</para>
- </listitem>
- <listitem>
- <para>lists are expressed with commas, so Jacob's check_nmap
- uses constructs like '-p 1000,1010,1050:1060,2000'</para>
- </listitem>
- <listitem>
- <para>If possible when writing lists, use tokens to make the
- list easy to remember and non-order dependent - so
- check_disk uses '-c 10000,10%' so that it is clear which is
- the precentage and which is the KB values (note that due to
- my own lack of foresight, that used to be '-c 10000:10%' but
- such constructs should all be changed for consistency,
- though providing reverse compatibility is fairly
- easy).</para>
- </listitem>
- </orderedlist>
- <para>As always, comments are welcome - making this consistent
- without a host of long options was quite a hassle, and I would
- suspect that there are flaws in this strategy.
- </para>
- </section>
- </section>
- <section id="CodingGuidelines"><title>Coding guidelines</title>
- <para>See <ulink url="http://www.gnu.org/prep/standards_toc.html">GNU
- Coding standards</ulink> for general guidelines.</para>
- <section><title>Comments</title>
- <para>You should use /* */ for comments and not // as some compilers
- do not handle the latter form.</para>
- <para>If you have copied a routine from another source, make sure the licence
- from your source allows this. Add a comment referencing the ACKNOWLEDGEMENTS
- file, where you can put more detail about the source.</para>
- <para>For contributed code, do not add any named credits in the source code
- - contributors should be added into the THANKS.in file instead.
- </para>
- </section>
- <section><title>CVS comments</title>
- <para>When adding CVS comments at commit time, you can use the following prefixes:
- <variablelist>
- <varlistentry><term>- comment</term>
- <listitem>
- <para>for a comment that can be removed from the Changelog</para>
- </listitem>
- </varlistentry>
- <varlistentry><term>* comment</term>
- <listitem>
- <para>for an important amendment to be included into a features list</para>
- </listitem>
- </varlistentry>
- </variablelist>
- </para>
- <para>If the change is due to a contribution, please quote the contributor's name
- and, if applicable, add the SourceForge Tracker number. Don't forget to
- update the THANKS.in file.</para>
- </section>
- </section>
- <section id="SubmittingChanges"><title>Submission of new plugins and patches</title>
- <section id="Patches"><title>Patches</title>
- <para>If you have a bug patch, please supply a unified or context diff against the
- version you are using. For new features, please supply a diff against
- the CVS HEAD version.</para>
- <para>Patches should be submitted via
- <ulink url="http://sourceforge.net/tracker/?group_id=29880&atid=397599">SourceForge's
- tracker system for Nagiosplug patches</ulink>
- and be announced to the nagiosplug-devel mailing list.</para>
- <para>Submission of a patch implies that the submmitter acknowledges that they
- are the author of the code (or have permission from the author to release the code)
- and agree that the code can be released under the GPL. The copyright for the changes will
- then revert to the Nagios Plugin Development Team - this is required so that any copyright
- infringements can be investigated quickly without contacting a huge list of copyright holders.
- Credit will always be given for any patches through a THANKS file in the distribution.</para>
- </section>
- <section id="Newplugins"><title>New plugins</title>
- <para>If you would like others to use your plugins, please add it to
- the official 3rd party plugin repository,
- <ulink url="http://www.nagiosexchange.org">NagiosExchange</ulink>.
- </para>
- <para>We are not accepting requests for inclusion of plugins into
- our distribution at the moment, but when we do, these are the minimum
- requirements:
- </para>
- <orderedlist>
- <listitem>
- <para>The standard command options are supported (--help, --version,
- --timeout, --warning, --critical)</para>
- </listitem>
- <listitem>
- <para>It is determined to be not redundant (for instance, we would not
- add a new version of check_disk just because someone had provide
- a plugin that had perf checking - we would incorporate the features
- into an exisiting plugin)</para>
- </listitem>
- <listitem>
- <para>One of the developers has had the time to audit the code and declare
- it ready for core</para>
- </listitem>
- <listitem>
- <para>It should also follow code format guidelines, and use functions from
- utils (perl or c or sh) rather than using its own</para>
- </listitem>
- <listitem>
- <para>Includes patches to configure.in if required (via the EXTRAS list if
- it will only work on some platforms)</para>
- </listitem>
- <listitem>
- <para>If possible, please submit a test harness. Documentation on sample
- tests coming soon</para>
- </listitem>
- </orderedlist>
- </section>
- </section>
- </article>
-
- </book>
|