3. Object data配置
那么现在就是要告诉Nagios用什么来保持这些附件标签的时候了。因此, 我们必须为它提供有关的信息:
- 什么时候 and 怎样 执行这些检查并发送通知;
- 要通知谁;
- 那些 主机和服务需要监视。
这些信息都由object来表示, 这些object在一组 "define" 声明里被定义, 这些声明包含在大括号内,并且含有不定数量的新行分割指令, 形式为 keyword/value。 Keyword和value之间是空白,多个values之间用都和分割; 声明里允许使用缩进。
简而言之, object的基本语法可以表示如下:
代码: 全选
define object {
keyword-1 value-1
keyword-2 value-2,value-3,...
[...]
keyword-n value-n
}
Object定义可以分割成任意数量的文件: 你只需记住要通过 cfg_file 和/或 cfg_dir 指令将所有这些文件列在
主配置文件 里。
3.1 有时限的定义
有时限的声明允许你指定, 每周的每一天, 在一个或多个时段执行特定检查 和/或 通知特定的人。 时段不能跨越午夜(midnight),而且排除的日期就省略了(leo: 不做检查) 。
在下面的例子中, 所有的时段定义被组合进一个名为 timeperiods.cfg 的文件里,这个文件保存在 /var/www/etc/nagios/ 目录。
文件 /var/www/etc/nagios/timeperiods.cfg
代码: 全选
# The following timeperiod definition includes normal work hours. The
# 'timeperiod_name' and 'alias' directives are mandatory. Note that weekend days
# are simply omitted
define timeperiod {
timeperiod_name workhours
alias Work Hours
monday 09:00-18:00
tuesday 09:00-18:00
wednesday 09:00-18:00
thursday 09:00-18:00
friday 09:00-18:00
}
# The following timeperiod includes all time outside normal work hours. The
# time slot between 6 p.m. and 9 a.m. must be split into two intervals, to avoid
# crossing midnight
define timeperiod {
timeperiod_name nonworkhours
alias Non-Work Hours
sunday 00:00-24:00
monday 00:00-09:00,18:00-24:00
tuesday 00:00-09:00,18:00-24:00
wednesday 00:00-09:00,18:00-24:00
thursday 00:00-09:00,18:00-24:00
friday 00:00-09:00,18:00-24:00
saturday 00:00-24:00
}
# Most checks will probably run on a continuous basis
define timeperiod {
timeperiod_name always
alias Every Hour Every Day
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
# The right timeperiod when you don't want to bother with notifications (e.g.
# during testing)
define timeperiod {
timeperiod_name never
alias No Time is a Good Time
}
# Some exceptions to the normal weekly time (see documentation for more examples)
define timeperiod {
timeperiod_name exceptions
alias Some random dates
2008-12-15 00:00-24:00 ; December 15th, 2008
friday 3 00:00-24:00 ; 3rd Friday of every month
february -1 00:00-24:00 ; Last day in February of every year
march 20 - june 21 00:00-24:00 ; Spring
day 1 - 15 00:00-24:00 ; First half of every month
2008-01-01 / 7 00:00-24:00 ; Every 7 days from Jan 1st, 2008
}
3.2 命令定义
下面的步骤是告诉Nagios
怎样 执行各种检查并发送通知; 具体是通过定义多个命令object来完成的,这些定义的object指定了Nagios实际运行的命令。
命令定义是由多个
短名称+命令 (都是强制性的)的行构成的,而且可以包含宏。就像我们
以前 提到过的, 宏是变量, 前后都是 "$" 符, 在运行一条命令是宏会扩展为相应的数值; 使用宏可以使命令定义更加简单标准。我们举个简单明了的例子。
假设你要件事一个IP地址为 "1.2.3.4"的web服务器; 你可以这样定义一条命令:
代码: 全选
define command {
command_name check-http
command_line /usr/local/libexec/nagios/check_http -I 1.2.3.4
}
这条定义是正确的、而且可执行。不过,稍后你在需要添加一个新的web服务器怎么办? 怎样才能方便地定义一条只是IP不同的新命令(几乎一样)? 如果利用宏的高效特点,只需定义一条通用的命令:
代码: 全选
define command {
command_name check-http
command_line $USER1$/check_http -I $HOSTADDRESS$
}
然后,Nagios在运行时会将内置的宏 $HOSTADDRESS$ 扩展为相应的IP地址, 这个IP地址从主机定义里获取 (看
下面)。你对
前一章 应该有些印象, 这里的宏 $USER1$ 保存着到插件目录的路径。
现在我们考虑得稍微复杂一些! 怎样让Nagios检查每台服务器上的特定的URL是否有效呢? 这里所说的URL可能在每台服务器上均不同, 怎样才能用一条通信和有效的命令定义涵盖每台主机呢! 尽管听起来有些矛盾, 不过Nagios仍旧是使用宏来解决这个问题的: 事实上, 这里的宏 $ARGn$ (这个n是一个1到32之间的数值) 代表着特定服务的参数。这些参数将在后面的服务定义里指定 (看
下面 以获取了解的细节)。因此, 上面的命令定义可以演变成:
代码: 全选
define command {
command_name check-http
command_line $USER1$/check_http -I $HOSTADDRESS$ -u $ARG1$
}
除了我们刚看到的这些, Nagios还提供了其它一些有用的宏。请参考
文档 以了解有用的宏以及其有效语法。下面是一组简单的命令定义。
文件 /var/www/etc/nagios/commands.cfg
代码: 全选
################################################################################
# Notification commands #
# There are no standard notification plugins; hence notification commands are #
# usually custom scripts or mere command lines. #
################################################################################
define command {
command_name host-notify-by-email
command_line $USER1$/host_notify_by_email.sh $CONTACTEMAIL$
}
define command {
command_name notify-by-email
command_line $USER1$/notify_by_email.sh $CONTACTEMAIL$
}
define command {
command_name host-notify-by-SMS
command_line /usr/local/bin/sendsms $ADDRESS1$ "Nagios: Host $HOSTNAME$ ($HOSTADDRESS$)is in state: $HOSTSTATE$"
}
define command {
command_name notify-by-SMS
command_line /usr/local/bin/sendsms $ADDRESS1$ "Nagios: Service $SERVICEDESC$ on $HOSTALIAS$ is in state: $SERVICESTATE$"
}
################################################################################
# Check commands #
# The official Nagios plugins should handle most of your needs for host and #
# service checks. Anyway, should they not, we will discuss in a moment how to #
# write custom plugins. #
################################################################################
define command {
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
}
define command {
command_name check-ssh
command_line $USER1$/check_ssh $HOSTADDRESS$
}
define command {
command_name check-http
command_line $USER1$/check_http -I $HOSTADDRESS$ -u $ARG1$
}
define command {
command_name check-smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$
}
define command {
command_name check-imap
command_line $USER1$/check_imap -H $HOSTADDRESS$
}
define command {
command_name check-dns
command_line $USER1$/check_dns -s $HOSTADDRESS$ -H $ARG1$ -a $ARG2$
}
define command {
command_name check-mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS -u $USER2$ -p $USER3$
}
[...]
3.3 Contact definition
contact objects 允许你指定在触发警告条件时应该自动通知的那些人。 Contacts 最初是分别定义的,然后被组合进contactgroup objects, 这是便于管理。
在下面的定义里,首先我们将参照先前定义的objects。实际这里的 host_notification_period 和 service_notification_period 指令的值肯定是
有时限的 objects; 而这里的the host_notification_command 和 service_notification_command 指令的值肯定是
命令 objects。
文件 /var/www/etc/nagios/contacts.cfg
代码: 全选
define contact {
# Short name to identify the contact
contact_name john
# Longer name or description
alias John Doe
# Enable notifications for this contact
host_notifications_enabled 1
service_notifications_enabled 1
# Timeperiods during which the contact can be notified about host and service
# problems or recoveries
host_notification_period always
service_notification_period always
# Host states for which notifications can be sent out to this contact
# (d=down, u=unreachable, r=recovery, f=flapping, n=none)
host_notification_options d,u,r
# Service states for which notifications can be sent out to this contact
# (w=warning, c=critical, u=unknown, r=recovery, f=flapping, n=none)
service_notification_options w,u,c,r
# Command(s) used to notify the contact about host and service problems
# or recoveries
host_notification_commands host-notify-by-email,host-notify-by-SMS
service_notification_commands notify-by-email,notify-by-SMS
# Email address for the contact
email [email protected]
# Nagios provides 6 address directives (named address1 through address6) to
# specify additional "addresses" for the contact (e.g. a mobile phone number
# for SMS notifications)
address1 xxx-xxx-xxxx
# Allow this contact to submit external commands to Nagios from the CGIs
can_submit_commands 1
}
# The following contact is split in two, to allow for different notification
# options depending on the timeperiod
define contact {
contact_name danix@work
alias Daniele Mazzocchio
host_notifications_enabled 1
service_notifications_enabled 1
host_notification_period workhours
service_notification_period workhours
host_notification_options d,u,r
service_notification_options w,u,c,r
host_notification_commands host-notify-by-email
service_notification_commands notify-by-email
email [email protected]
can_submit_commands 1
}
define contact {
contact_name danix@home
alias Daniele Mazzocchio
host_notifications_enabled 1
service_notifications_enabled 1
host_notification_period nonworkhours
service_notification_period nonworkhours
host_notification_options d,u
service_notification_options c
host_notification_commands host-notify-by-email,host-notify-by-SMS
service_notification_commands notify-by-email,notify-by-SMS
email [email protected]
address1 xxx-xxx-xxxx
can_submit_commands 1
}
[...]
# All administrator contacts are grouped together in the "Admins"
# contactgroup
define contactgroup {
contactgroup_name Admins
alias Nagios Administrators
members danix@work,danix@home,john
}
[...]
3.4 主机定义
现在我们最终要处理Nagios配置里最重要的一个环节: 那些我们要监视的主机 (服务器, 工作站, 设备等) 的定义。这会让我们了解Nagios配置里一个最强大的特性:
object inheritance(leo:inheritance — 遗传、继承). 请注意, 尽管现在我们首先讨论它, object inheritance 实际上可以应用到所有的Nagios objects上; 不过, 在主机和服务的定义里,我们可以发挥出其最大的功效。。
事实上, 配置一台主机需要设置不少的参数; 而对多数的主机来说这些参数的数值都是一样的。如果没有object inheritance, 就意味着浪费大量的时间一遍遍地输入同样的参数,最终得到是长而无序、无法管理的配文件。
幸运的是, Nagios的设计人非常聪明地设计出了template objects,它的属性可以 "inherited" (继承) 给其它的
objects,而无需重写这些
objects 。下面是一个创建template的例子:
代码: 全选
define host {
name generic-host-template # Template name
check_command check-host-alive
check_period always
max_check_attempts 5
notification_options d,u,r
register 0 # Don't register it!
}
你可以看见, 一个模板object定义几乎和一个普通object的定义完全一样。唯一的区别是:
- 每个template必须用name指令分配了一个名字;
- 因为这并非一台真实的主机, 你必须告诉 Nagios 不要注册它。方法是将register指令的数值设为0; 这个特性不具备继承性,默认值是, 所以你不用担心所有的“子”object中改值被重写(leo: 我只是按照我的理解翻译的,如果读者觉得此处翻译的不对,请指出。);
- 一个template object可以保留未完成状态, 也就是说,它可以不强制应用所有的参数。
要template里创建一台真实的主机, 你只需用一个 use 指令将一个template名称指定为一个值,而且确保所有的强制性区域具有继承性或者是明确值 :
代码: 全选
define host {
host_name hostname
use generic-host-template
alias alias
address x.x.x.x
}
好了,现在我们从研究理论转到实际操作,我们将定义两个主机的templates。注意,第二个从第一个继承特性; 这是可能的,因为Nagios允许template objects具有多层。
文件 /var/www/etc/nagios/generic-hosts.cfg
代码: 全选
# The following is a template for all hosts in the LAN
define host {
# Template name
name generic-lan-host
# Command to use to check the state of the host
check_command check-host-alive
# Contact groups to notify about problems (or recoveries) with this host
contact_groups Admins
# Enable active checks
active_checks_enabled 1
# Time period during which active checks of this host can be made
check_period always
# Number of times that Nagios will repeat a check returning a non-OK state
max_check_attempts 3
# Enable the event handler
event_handler_enabled 1
# Enable the processing of performance data
process_perf_data 1
# Enable retention of host status information across program restarts
retain_status_information 1
# Enable retention of host non-status information across program restarts
retain_nonstatus_information 1
# Enable notifications
notifications_enabled 1
# Time interval (in minutes) between consecutive notifications about the
# server being _still_ down or unreachable
notification_interval 120
# Time period during which notifications about this host can be sent out
notification_period always
# Host states for which notifications should be sent out (d=down,
# u=unreachable, r=recovery, f=flapping, n=none)
notification_options d,u,r
# Don't register this definition: it's only a template, not an actual host
register 0
}
# DMZ hosts inherit all attributes from the generic-lan-host by means of the
# 'use' directive. The only difference is that Nagios has to go through the
# internal (CARP) firewalls to reach the DMZ servers, thus requiring the
# additional 'parents' directive.
define host {
name generic-dmz-host
# The 'use' directive specifies the name of a template object that you want
# this host to inherit properties from
use generic-lan-host
# This directive specifies the hosts that lie between the monitoring host
# and the remote host (more information here)
parents fw-int
# This too is a template
register 0
}
现在我们应用template的长处只需几行就可以定义实际的主机。
文件 /var/www/etc/nagios/hosts/servers.cfg
代码: 全选
# Configuration for host dns1.lan.kernel-panic.it
define host {
use generic-lan-host
host_name dns1
alias LAN primary master name server
address 172.16.0.161
# Extended information (completely optional)
notes This is the internal primary master name server (Bind 9.4.2-P2)
# URL with more information about this host
notes_url http://www.kernel-panic.it/openbsd/dns/
# Image associated with this host in the status CGI; images must be placed in
# /var/www/nagios/images/logos/
icon_image dns.png
# String used in the 'alt' tag of the icon_image
icon_image_alt [dns]
# Image associated with this host in the statusmap CGI
statusmap_image dns.gd2
}
# Configuration for host mail.kernel-panic.it
define host {
use generic-dmz-host
host_name mail
alias Mail server
address 172.16.240.150
notes This is the Postfix mail server (with IMAP(S) and web access)
notes_url http://www.kernel-panic.it/openbsd/mail/
icon_image mail.png
icon_image_alt [Mail]
statusmap_image mail.gd2
}
# Configuration for host proxy.kernel-panic.it
define host {
use generic-dmz-host
host_name proxy
alias Proxy server
notes This is the Squid proxy server
notes_url http://www.kernel-panic.it/openbsd/proxy/
icon_image proxy.png
icon_image_alt [Proxy]
statusmap_image proxy.gd2
}
[...]
文件 /var/www/etc/nagios/hosts/firewalls.cfg
代码: 全选
# Configuration for host fw-int.kernel-panic.it
define host {
use generic-lan-host
host_name fw-int
alias Internal firewalls' CARP address
address 172.16.0.202
notes Virtual CARP address of the internal firewalls
notes_url http://www.kernel-panic.it/openbsd/carp/
icon_image fw.png
icon_image_alt [FW]
statusmap_image fw.gd2
}
# Configuration for host mickey.kernel-panic.it
define host {
use generic-lan-host
host_name mickey
alias Internal Firewall #1
address 172.16.0.200
notes Internal firewall (first node of a two-nodes CARP cluster)
notes_url http://www.kernel-panic.it/openbsd/carp/
icon_image fw.png
icon_image_alt [FW]
statusmap_image fw.gd2
}
[...]
主机这里可以选择用hostgroup声明组成的一个组,尽管这对监控意义不大,但是可以让你在CGI里显示所有在组内的主机。
文件 /var/www/etc/nagios/hosts/hostgroups.cfg
代码: 全选
# Domain Name Servers
define hostgroup {
hostgroup_name DNS
alias Domain Name Servers
members dns1,dns2,dns3,dns4
notes Our internal Domain Name Servers, running Bind 9.4.2-P2
}
# Firewalls
define hostgroup {
hostgroup_name firewalls
alias CARP Firewalls
members mickey,minnie,donald,daisy,fw-int,fw-ext
notes Our CARP-enabled firewalls (both virtual and physical addresses)
}
# Web servers
define hostgroup {
hostgroup_name WWW
alias Web Servers
members www1,www2
notes Our corporate web servers, running Apache 1.3
}
3.5 服务定义
配置监视的服务和配置监视的主机很像: object的继承性可以节省你大量的输入,而且你可以用选项 servicegroup 将所有的服务组合在一起。下面是我们服务template的定义:
文件 /var/www/etc/nagios/generic-services.cfg
代码: 全选
define service {
# Template name
name generic-service
# Services are normally not volatile
is_volatile 0
# Contact groups to notify about problems (or recoveries) with this service
contact_groups Admins
# Enable active checks
active_checks_enabled 1
# Time period during which active checks of this service can be made
check_period always
# Time interval (in minutes) between "regular" checks, i.e. checks that
# occur when the service is in an OK state or when the service is in a non-OK
# state, but has already been re-checked max_check_attempts number of times
normal_check_interval 5
# Time interval (in minutes) between non-regular checks
retry_check_interval 1
# Number of times that Nagios will repeat a check returning a non-OK state
max_check_attempts 3
# Enable service check parallelization for better performance
parallelize_check 1
# Enable passive checks
passive_checks_enabled 1
# Enable the event handler
event_handler_enabled 1
# Enable the processing of performance data
process_perf_data 1
# Enable retention of service status information across program restarts
retain_status_information 1
# Enable retention of service non-status information across program restarts
retain_nonstatus_information 1
# Enable notifications
notifications_enabled 1
# Time interval (in minutes) between consecutive notifications about the
# service being _still_ in non-OK state
notification_interval 120
# Time period during which notifications about this service can be sent out
notification_period always
# Service states for which notifications should be sent out (c=critical,
# w=warning, u=unknown, r=recovery, f=flapping, n=none)
notification_options w,u,c,r
register 0
}
现在, 在考虑服务定义前, 我们应该完成将服务的特定参数传递给命令的
讨论 ,也就是宏 $ARGn$ 。可能你还记得, 宏作为承载体: 它们在服务里扩展传递给命令的第n个参数; 例如, 一条类似下面的命令希望得到两个传递参数:
代码: 全选
define command {
command_name some-command
command_line $USER1$/check_something $ARG1$ $ARG2$
}
因此, 要使用上面的命令那个配置一个服务检查, 我们必须分配给check_command 变量一个字符串,这个字符串包含一个命令的短名称,后面跟着这些参数, 由字符 "!" 分割。例如:
代码: 全选
define service {
service_description some-service
check_command some-command!arg-1!arg-2
[...]
}
现在我们可以着手实际服务的定义了:
文件/var/www/etc/nagios/services/services.cfg
代码: 全选
# Secure Shell service
define service {
use generic-service
service_description SSH
# Short name(s) of the host(s) that run this service. If a service runs on all
# hosts, you may use the '*' wildcard character
host_name *
check_command check-ssh
# This directive is a possible alternative to using the members directive in
# service groups definitions
servicegroups ssh-services
# Extended information
notes Availability of the SSH daemon
notes_url http://www.openssh.org/
icon_image ssh.png
icon_image_alt [SSH]
}
# Web service
define service {
use generic-service
service_description WWW
host_name www1,www2
check_command check-http!/index.html
notes Availability of the corporate web sites
notes_url http://www.apache.org/
icon_image www.png
icon_image_alt [WWW]
}
define service {
use generic-service
service_description WWW
host_name mail
check_command check-http!/webmail/index.html
notes Availability of the web access to the mail server
notes_url http://www.squirrelmail.org/
icon_image www.png
icon_image_alt [WWW]
}
[...]
Just like hosts, services can be grouped together with the servicegroup directive:
文件 /var/www/etc/nagios/services/servicegroups.cfg
代码: 全选
define servicegroup {
servicegroup_name www-services
alias Web Services
# The 'members' directive requires a comma-separated list of host and
# service pairs, e.g. 'host1,service1,host2,service2,...'
members www1,WWW,www2,WWW,mail,WWW
}
define servicegroup {
servicegroup_name dns-services
alias Domain Name Service
members dns1,DNS,dns2,DNS,dns3,DNS,dns4,DNS
}
# The members of the following servicegroup are specified with the
# 'serviecegroups' directive in the 'SSH' service definition
define servicegroup {
servicegroup_name ssh-services
alias Secure Shell Service
}
[...]
好的, 现在大块的工作已经完成: 最后一步是
配置web接口 ,然后我们就让Nagios工作了!