developer-guidelines.sgml 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599
  1. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
  2. <book>
  3. <title>Nagios Plug-in Developer Guidelines</title>
  4. <bookinfo>
  5. <authorgroup>
  6. <author>
  7. <firstname>Karl</firstname>
  8. <surname>DeBisschop</surname>
  9. <affiliation>
  10. <address><email>karl@debisschop.net</email></address>
  11. </affiliation>
  12. </author>
  13. <author>
  14. <firstname>Ethan</firstname>
  15. <surname>Galstad</surname>
  16. <authorblurb>
  17. <para>Author of Nagios</para>
  18. <para><ulink url="http://www.nagios.org"></ulink></para>
  19. </authorblurb>
  20. <affiliation>
  21. <address><email>netsaint@linuxbox.com</email></address>
  22. </affiliation>
  23. </author>
  24. <author>
  25. <firstname>Hugo</firstname>
  26. <surname>Gayosso</surname>
  27. <affiliation>
  28. <address><email>hgayosso@gnu.org</email></address>
  29. </affiliation>
  30. </author>
  31. <author>
  32. <firstname>Subhendu</firstname>
  33. <surname>Ghosh</surname>
  34. <affiliation>
  35. <address><email>sghosh@sourceforge.net</email></address>
  36. </affiliation>
  37. </author>
  38. <author>
  39. <firstname>Stanley</firstname>
  40. <surname>Hopcroft</surname>
  41. <affiliation>
  42. <address><email>stanleyhopcroft@sourceforge.net</email></address>
  43. </affiliation>
  44. </author>
  45. </authorgroup>
  46. <pubdate>2002</pubdate>
  47. <title>Nagios plug-in development guidelines</title>
  48. <revhistory>
  49. <revision>
  50. <revnumber>0.4</revnumber>
  51. <date>2 May 2002</date>
  52. </revision>
  53. </revhistory>
  54. <copyright>
  55. <year>2000 2001 2002</year>
  56. <holder>Karl DeBisschop, Ethan Galstad,
  57. Hugo Gayosso, Stanley Hopcroft, Subhendu Ghosh</holder>
  58. </copyright>
  59. </bookinfo>
  60. <preface id="preface"><title>Preface</title>
  61. <para>The purpose of this guidelines is to provide a reference for
  62. the plug-in developers and encourage the standarization of the
  63. different kind of plug-ins: C, shell, perl, python, etc.</para>
  64. <para>Nagios Plug-in Development Guidelines Copyright (C) 2000 2001
  65. 2002
  66. Karl DeBisschop, Ethan Galstad, Hugo Gayosso, Stanley Hopcroft,
  67. Subhendu Ghosh</para>
  68. <para>Permission is granted to make and distribute verbatim
  69. copies of this manual provided the copyright notice and this
  70. permission notice are preserved on all copies.</para>
  71. <para>The plugins themselves are copyrighted by their respective
  72. authors.</para>
  73. </preface>
  74. <article>
  75. <section id="DevRequirements"><title>Development platform requirements</title>
  76. <para>
  77. Nagios plugins are developed to the GNU standard, so any OS which is supported by GNU
  78. should run the plugins. While the requirements for compiling the Nagios plugins release
  79. is very small, to develop from CVS needs additional software to be installed. These are the
  80. minimum levels of software required:
  81. <literallayout>
  82. gnu make 3.79
  83. automake 1.6
  84. autoconf 2.52
  85. gettext 0.11.5
  86. </literallayout>
  87. To compile from CVS, after you have checked out the code, run:
  88. <literallayout>
  89. tools/setup
  90. ./configure
  91. make
  92. make install
  93. </literallayout>
  94. </para>
  95. </section>
  96. <section id="PlugOutput"><title>Plugin Output for Nagios</title>
  97. <para>You should always print something to STDOUT that tells if the
  98. service is working or why its failing. Try to keep the output short -
  99. probably less that 80 characters. Remember that you ideally would like
  100. the entire output to appear in a pager message, which will get chopped
  101. off after a certain length.</para>
  102. <section><title>Print only one line of text</title>
  103. <para>Nagios will only grab the first line of text from STDOUT
  104. when it notifies contacts about potential problems. If you print
  105. multiple lines, you're out of luck. Remember, keep it short and
  106. to the point.</para>
  107. </section>
  108. <section><title>Verbose output</title>
  109. <para>Use the -v flag for verbose output. You should allow multiple
  110. -v options for additional verbosity, up to a maximum of 3. The standard
  111. type of output should be:</para>
  112. <table id="verbose_levels"><title>Verbose output levels</title>
  113. <tgroup cols="2">
  114. <thead>
  115. <row>
  116. <entry><para>Verbosity level</para></entry>
  117. <entry><para>Type of output</para></entry>
  118. </row>
  119. </thead>
  120. <tbody>
  121. <row>
  122. <entry align=center><para>0</para></entry>
  123. <entry><para>Single line, minimal output. Summary</para></entry>
  124. </row>
  125. <row>
  126. <entry align=center><para>1</para></entry>
  127. <entry><para>Single line, additional information (eg list processes that fail)</para></entry>
  128. </row>
  129. <row>
  130. <entry align=center><para>2</para></entry>
  131. <entry><para>Multi line, configuration debug output (eg ps command used)</para></entry>
  132. </row>
  133. <row>
  134. <entry align=center><para>3</para></entry>
  135. <entry><para>Lots of detail for plugin problem diagnosis</para></entry>
  136. </row>
  137. </tbody>
  138. </tgroup>
  139. </table>
  140. </section>
  141. <section><title>Screen Output</title>
  142. <para>The plug-in should print the diagnostic and just the
  143. synopsis part of the help message. A well written plugin would
  144. then have --help as a way to get the verbose help.</para>
  145. <para>Code and output should try to respect the 80x25 size of a
  146. crt (remember when fixing stuff in the server room!)</para>
  147. </section>
  148. <section><title>Return the proper status code</title>
  149. <para>See <xref linkend="ReturnCodes"> below
  150. for the numeric values of status codes and their
  151. description. Remember to return an UNKNOWN state if bogus or
  152. invalid command line arguments are supplied or it you are unable
  153. to check the service.</para>
  154. </section>
  155. <section><title>Plugin Return Codes</title>
  156. <para>The return codes below are based on the POSIX spec of returning
  157. a positive value. Netsaint prior to v0.0.7 supported non-POSIX
  158. compliant return code of "-1" for unknown. Nagios supports POSIX return
  159. codes by default.</para>
  160. <para>Note: Some plugins will on occasion print on STDOUT that an error
  161. occurred and error code is 138 or 255 or some such number. These
  162. are usually caused by plugins using system commands and having not
  163. enough checks to catch unexpected output. Developers should include a
  164. default catch-all for system command output that returns an UNKOWN
  165. return code.</para>
  166. <table id="ReturnCodes"><title>Plugin Return Codes</title>
  167. <tgroup cols="3">
  168. <thead>
  169. <row>
  170. <entry><para>Numeric Value</para></entry>
  171. <entry><para>Service Status</para></entry>
  172. <entry><para>Status Description</para></entry>
  173. </row>
  174. </thead>
  175. <tbody>
  176. <row>
  177. <entry align=center><para>0</para></entry>
  178. <entry valign=middle><para>OK</para></entry>
  179. <entry><para>The plugin was able to check the service and it
  180. appeared to be functioning properly</para></entry>
  181. </row>
  182. <row>
  183. <entry align=center><para>1</para></entry>
  184. <entry valign=middle><para>Warning</para></entry>
  185. <entry><para>The plugin was able to check the service, but it
  186. appeared to be above some "warning" threshold or did not appear
  187. to be working properly</para></entry>
  188. </row>
  189. <row>
  190. <entry align=center><para>2</para></entry>
  191. <entry valign=middle><para>Critical</para></entry>
  192. <entry><para>The plugin detected that either the service was not
  193. running or it was above some "critical" threshold</para></entry>
  194. </row>
  195. <row>
  196. <entry align=center><para>3</para></entry>
  197. <entry valign=middle><para>Unknown</para></entry>
  198. <entry><para>Invalid command line arguments were supplied to the
  199. plugin or the plugin was unable to check the status of the given
  200. hosts/service</para></entry>
  201. </row>
  202. </tbody>
  203. </tgroup>
  204. </table>
  205. </section>
  206. </section>
  207. <section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title>
  208. <section><title>Don't execute system commands without specifying their
  209. full path</title>
  210. <para>Don't use exec(), popen(), etc. to execute external
  211. commands without explicity using the full path of the external
  212. program.</para>
  213. <para>Doing otherwise makes the plugin vulnerable to hijacking
  214. by a trojan horse earlier in the search path. See the main
  215. plugin distribution for examples on how this is done.</para>
  216. </section>
  217. <section><title>Use spopen() if external commands must be executed</title>
  218. <para>If you have to execute external commands from within your
  219. plugin and you're writing it in C, use the spopen() function
  220. that Karl DeBisschop has written.</para>
  221. <para>The code for spopen() and spclose() is included with the
  222. core plugin distribution.</para>
  223. </section>
  224. <section><title>Don't make temp files unless absolutely required</title>
  225. <para>If temp files are needed, make sure that the plugin will
  226. fail cleanly if the file can't be written (e.g., too few file
  227. handles, out of disk space, incorrect permissions, etc.) and
  228. delete the temp file when processing is complete.</para>
  229. </section>
  230. <section><title>Don't be tricked into following symlinks</title>
  231. <para>If your plugin opens any files, take steps to ensure that
  232. you are not following a symlink to another location on the
  233. system.</para>
  234. </section>
  235. <section><title>Validate all input</title>
  236. <para>use routines in utils.c or utils.pm and write more as needed</para>
  237. </section>
  238. </section>
  239. <section id="PerlPlugin"><title>Perl Plugins</title>
  240. <para>Perl plugins are coded a little more defensively than other
  241. plugins because of embedded Perl. When configured as such, embedded
  242. Perl Nagios (ePN) requires stricter use of the some of Perl's features.
  243. This section outlines some of the steps needed to use ePN
  244. effectively.</para>
  245. <orderedlist>
  246. <listitem><para> Do not use BEGIN and END blocks since they will be called
  247. the first time and when Nagios shuts down with Embedded Perl (ePN). In
  248. particular, do not use BEGIN blocks to initialize variables.</para>
  249. </listitem>
  250. <listitem><para>To use utils.pm, you need to provide a full path to the
  251. module in order for it to work with ePN.</para>
  252. <literallayout>
  253. e.g.
  254. use lib "/usr/local/nagios/libexec";
  255. use utils qw(...);
  256. </literallayout>
  257. </listitem>
  258. <listitem><para>Perl scripts should be called with "-w"</para>
  259. </listitem>
  260. <listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at
  261. least explicitly package names as in "$main::x" or predeclare every
  262. variable. </para>
  263. <para>Explicitly initialize each varialable in use. Otherwise with
  264. caching enabled, the plugin will not be recompilied each time, and
  265. therefore Perl will not reinitialize all the variables. All old
  266. variable values will still be in effect.</para>
  267. </listitem>
  268. <listitem><para>Do not use < DATA > (these simply do not compile under ePN).</para>
  269. </listitem>
  270. <listitem><para>Do not use named subroutines</para>
  271. </listitem>
  272. <listitem><para>If writing to a file (perhaps recording
  273. performance data) explicitly close close it. The plugin never
  274. calls <emphasis role=strong>exit</emphasis>; that is caught by
  275. p1.pl, so output streams are never closed.</para>
  276. </listitem>
  277. <listitem><para>As in <xref linkend="runtime"> all plugins need
  278. to monitor their runtime, specially if they are using network
  279. resources. Use of the <emphasis>alarm</emphasis> is recommended.
  280. Plugins may import a default time out ($TIMEOUT) from utils.pm.
  281. </para>
  282. </listitem>
  283. <listitem><para>Perl plugins should import %ERRORS from utils.pm
  284. and then "exit $ERRORS{'OK'}" rather than "exit 0"
  285. </para>
  286. </listitem>
  287. </orderedlist>
  288. </section>
  289. <section id="runtime"><title>Runtime Timeouts</title>
  290. <para>Plugins have a very limited runtime - typically 10 sec.
  291. As a result, it is very important for plugins to maintain internal
  292. code to exit if runtime exceeds a threshold. </para>
  293. <para>All plugins should timeout gracefully, not just networking
  294. plugins. For instance, df may lock if you have automounted
  295. drives and your network fails - but on first glance, who'd think
  296. df could lock up like that. Plus, it should just be more error
  297. resistant to be able to time out rather than consume
  298. resources.</para>
  299. <section><title>Use DEFAULT_SOCKET_TIMEOUT</title>
  300. <para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para>
  301. </section>
  302. <section><title>Add alarms to network plugins</title>
  303. <para>If you write a plugin which communicates with another
  304. networked host, you should make sure to set an alarm() in your
  305. code that prevents the plugin from hanging due to abnormal
  306. socket closures, etc. Nagios takes steps to protect itself
  307. against unruly plugins that timeout, but any plugins you create
  308. should be well behaved on their own.</para>
  309. </section>
  310. </section>
  311. <section id="PlugOptions"><title>Plugin Options</title>
  312. <para>A well written plugin should have --help as a way to get
  313. verbose help. Code and output should try to respect the 80x25 size of a
  314. crt (remember when fixing stuff in the server room!)</para>
  315. <section><title>Option Processing</title>
  316. <para>For plugins written in C, we recommend the C standard
  317. getopt library for short options. Getopt_long is always available.
  318. </para>
  319. <para>For plugins written in Perl, we recommend Getopt::Long module.</para>
  320. <para>Positional arguments are strongly discouraged.</para>
  321. <para>There are a few reserved options that should not be used
  322. for other purposes:</para>
  323. <literallayout>
  324. -V version (--version)
  325. -h help (--help)
  326. -t timeout (--timeout)
  327. -w warning threshold (--warning)
  328. -c critical threshold (--critical)
  329. -H hostname (--hostname)
  330. -v verbose (--verbose)
  331. </literallayout>
  332. <para>In addition to the reserved options above, some other standard options are:</para>
  333. <literallayout>
  334. -C SNMP community (--community)
  335. -a authentication password (--authentication)
  336. -l login name (--logname)
  337. -p port or password (--port or --passwd/--password)monitors operational
  338. -u url or username (--url or --username)
  339. </literallayout>
  340. <para>Look at check_pgsql and check_procs to see how I currently
  341. think this can work. Standard options are:</para>
  342. <para>The option -V or --version should be present in all
  343. plugins. For C plugins it should result in a call to print_revision, a
  344. function in utils.c which takes two character arguments, the
  345. command name and the plugin revision.</para>
  346. <para>The -? option, or any other unparsable set of options,
  347. should print out a short usage statement. Character width should
  348. be 80 and less and no more that 23 lines should be printed (it
  349. should display cleanly on a dumb terminal in a server
  350. room).</para>
  351. <para>The option -h or --help should be present in all plugins.
  352. In C plugins, it should result in a call to print_help (or
  353. equivalent). The function print_help should call print_revision,
  354. then print_usage, then should provide detailed
  355. help. Help text should fit on an 80-character width display, but
  356. may run as many lines as needed.</para>
  357. <para>The option -v or --verbose should be present in all plugins.
  358. The user should be allowed to specify -v multiple times to increase
  359. the verbosity level, as described in <xref linkend="verbose_levels">.</para>
  360. </section>
  361. <section>
  362. <title>Plugins with more than one type of threshold, or with
  363. threshold ranges</title>
  364. <para>Old style was to do things like -ct for critical time and
  365. -cv for critical value. That goes out the window with POSIX
  366. getopt. The allowable alternatives are:</para>
  367. <orderedlist>
  368. <listitem>
  369. <para>long options like -critical-time (or -ct and -cv, I
  370. suppose).</para>
  371. </listitem>
  372. <listitem>
  373. <para>repeated options like `check_load -w 10 -w 6 -w 4 -c
  374. 16 -c 10 -c 10`</para>
  375. </listitem>
  376. <listitem>
  377. <para>for brevity, the above can be expressed as `check_load
  378. -w 10,6,4 -c 16,10,10`</para>
  379. </listitem>
  380. <listitem>
  381. <para>ranges are expressed with colons as in `check_procs -C
  382. httpd -w 1:20 -c 1:30` which will warn above 20 instances,
  383. and critical at 0 and above 30</para>
  384. </listitem>
  385. <listitem>
  386. <para>lists are expressed with commas, so Jacob's check_nmap
  387. uses constructs like '-p 1000,1010,1050:1060,2000'</para>
  388. </listitem>
  389. <listitem>
  390. <para>If possible when writing lists, use tokens to make the
  391. list easy to remember and non-order dependent - so
  392. check_disk uses '-c 10000,10%' so that it is clear which is
  393. the precentage and which is the KB values (note that due to
  394. my own lack of foresight, that used to be '-c 10000:10%' but
  395. such constructs should all be changed for consistency,
  396. though providing reverse compatibility is fairly
  397. easy).</para>
  398. </listitem>
  399. </orderedlist>
  400. <para>As always, comments are welcome - making this consistent
  401. without a host of long options was quite a hassle, and I would
  402. suspect that there are flaws in this strategy.
  403. </para>
  404. </section>
  405. </section>
  406. <section id="CodingGuidelines"><title>Coding guidelines</title>
  407. <para>See <ulink url="http://www.gnu.org/prep/standards_toc.html">GNU
  408. Coding standards</ulink> for general guidelines.</para>
  409. <section><title>Comments</title>
  410. <para>You should use /* */ for comments and not // as some compilers
  411. do not handle the latter form.</para>
  412. </section>
  413. <section><title>CVS comments</title>
  414. <para>When adding CVS comments at commit time, you can use the following prefixes:
  415. <variablelist>
  416. <varlistentry><term>- comment</term>
  417. <listitem>
  418. <para>for a comment that can be removed from the Changelog</para>
  419. </listitem>
  420. </varlistentry>
  421. <varlistentry><term>* comment</term>
  422. <listitem>
  423. <para>for an important amendment to be included into a features list</para>
  424. </listitem>
  425. </varlistentry>
  426. </variablelist>
  427. </para>
  428. </section>
  429. </section>
  430. <section id="SubmittingChanges"><title>Submission of new plugins and patches</title>
  431. <section id="Patches"><title>Patches</title>
  432. <para>If you have a bug patch, please supply a unified or context diff against the
  433. version you are using. For new features, please supply a diff against
  434. the CVS HEAD version.</para>
  435. <para>Patches should be submitted via
  436. <ulink url="http://sourceforge.net/tracker/?group_id=29880&amp;atid=397599">SourceForge's
  437. tracker system for Nagiosplug patches</ulink>
  438. and be announced to the nagiosplug-devel mailing list.</para>
  439. </section>
  440. <section id="New_plugins"><title>New plugins</title>
  441. <para>If you would like others to use your plugins and have it included in
  442. the standard distribution, please include patches for the relevant
  443. configuration files, in particular "configure.in". Otherwise submitted
  444. plugins will be included in the contrib directory.</para>
  445. <para>Plugins in the contrib directory are going to be migrated to the
  446. standard plugins/plugin-scripts directory as time permits and per user
  447. requests. The minimum requirements are:</para>
  448. <orderedlist>
  449. <listitem>
  450. <para>The standard command options are supported (--help, --version,
  451. --timeout, --warning, --critical)</para>
  452. </listitem>
  453. <listitem>
  454. <para>It is determined to be not redundant (for instance, we would not
  455. add a new version of check_disk just because someone had provide
  456. a plugin that had perf checking - we would incorporate the features
  457. into an exisiting plugin)</para>
  458. </listitem>
  459. <listitem>
  460. <para>One of the developers has had the time to audit the code and declare
  461. it ready for core</para>
  462. </listitem>
  463. <listitem>
  464. <para>It should also follow code format guidelines, and use functions from
  465. utils (perl or c or sh) rather than cooking it's own</para>
  466. </listitem>
  467. </orderedlist>
  468. <para>New plugins should be submitted via
  469. <ulink url="http://sourceforge.net/tracker/?group_id=29880&amp;atid=541465">SourceForge's
  470. tracker system for Nagiosplug new plugins</ulink>
  471. and be announced to the nagiosplug-devel mailing list.</para>
  472. <para>For new plugins, provide a diff to add to the EXTRAS list (configure.in)
  473. unless you are fairly sure that the plugin will work for all platforms with
  474. no non-standard software added.</para>
  475. <para>If possible please submit a test harness. Documentation on sample
  476. tests coming soon.</para>
  477. </section>
  478. </section>
  479. </article>
  480. </book>