本帖最后由 monicazhang 于 2015-10-30 21:40 编辑
define host { use host-pnp host_name nagios-test1 alias nagios test1 address 172.26.188.202 contact_groups admins check_command check-host-alive max_check_attempts 5 notification_interval 10 notification_period 24x7 notification_options d,u,r } 修改services.cfg services.cfg定义被监控机的监控项,配置如下: define service { use srv-pnp host_name nagios-test1 service_description check_tcp 80 check_period 24x7 max_check_attempts 4 normal_check_interval 3 retry_check_interval 2 contact_groups admins notification_interval 10 notification_period 24x7 notification_options w,u,c,r event_handler_enabled 1 event_handler restart-httpd check_command check_tcp!80 nagios安装 } 被监控机配置被监控机安装plugin #groupadd nagios #useradd -g nagios -d /usr/local/nagios nagios #tar –zxvf nagios-plugins-1.4.16.tar.gz #cd nagios-plugins-1.4.16 #./configure --with-nagios-user=nagios --with-nagios-group=nagios # make # make install 被监控机安装nrpe #tar -zxvf nrpe-2.13.tar.gz #cd nrpe-2.13 #./configure --prefix=/usr/local/nagios #make all #make install-plugin #make install-daemon #make install-daemon-config #chown -R nagios:nagios /usr/local/nagios
将nrpe加入xinetd启动 #vi /etc/xinetd.d/nrpe service nrpe { flags = REUSE socket_type = stream port = 5666 wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nrpe server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd 开源监控软件 log_on_failure += USERID disable = no only_from = 127.0.0.1 172.26.188.201 }
在/etc/services对应位置加入如下行: nrpe 5666/tcp # NRPE
启动nrpe service xinetd restart 配置 被监控机主要修改了以下这几个文件 /usr/local/nagios/etc/nrpe.cfg /etc/hosts.allow /etc/rc.local 1) 配置 NRPE 首先查看/usr/local/nagios/libexec文件夹下面是否有监控脚本命令,这些脚本既为nrpe.cfg中定义命令需要调用的脚本 vi /usr/local/nagios/etc/nrpe.cfg allowed_hosts=172.26.188.201 #监控机IP server_address=172.26.188.202 #本机IP 在nrpe.cfg中添加自定义的命令 以下条命令为例 Command[check_sdb2]=/usr/local/nagios/libexec/check_disk–w 20% -c 10% -p /dev/sdb2 监测硬盘使用量,check_sdb2为自定义名,check_disk为调用的脚本,-w是警告,剩余20%容量时,-c,10%为严重值,-p指定要监控的硬盘 2) 修改/etc/hosts.allow增加监控机ip echo 'nrpe:监控机ip ' >> /etc/hosts.allow 3) 启动 NRPE 守护进程: /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg–d 4) 设置开机自动启动NRPE。 echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >> /etc/rc.local 查看相应的端口:netstat -an |grep 5666 你会发现5666端口已开启。
要点:在添加被监控机时,首先要确定被监控机nrpe.cfg文件里有添加此命令行,前提是在被监控机的/usr/local/nagios/libexec文件夹下面有相应的监控命令的脚本;其次在监控机的command.cfg文件配置里也要有相应的配置,既依照文件里面nrpe的定义文件来添加相应的监控命令;最后是监控机的services.cfg文件中定义具体的监控项。 nagios事件自动处理nagios通过SSH执行event_hander nagios使用event handlers来在任何人收到通知之前由Nagios做一些前期故障修复。 nagios配置 事件处理命令可以用shell或是perl脚本,脚本中应该处理以下宏:
对服务的:$SERVICESTATE$、$SERVICESTATETYPE$和$SERVICEATTEMPT$;
对主机的:$HOSTSTATE$、$HOSTSTATETYPE$和$HOSTATTEMPT$。
脚本须检测这些作为命令行参数传入的值,并采取必要动作来处理这些值。 下面以nagios监控apache服务,当apache停止时自动启动apache。详细配置步骤如下: 1) 配置在Nagios监控机上无密码登录远程被监控机 Ø 生成ssh密钥文件 #su - nagios $ ssh-keygen -t rsa # 下面一直回车,不要设置密码 Generating public/private rsa key pair. Enter file in which to save the key (/home/nagios/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/nagios/.ssh/id_rsa. Your public key has been saved in /home/nagios/.ssh/id_rsa.pub. The key fingerprint is: Ø 将生成的密钥拷贝到要远程被监控主机上 $ scp .ssh/id_rsa.pub nagios-test1:/usr/local/nagios/ Ø 在要远程登录的被监控机器上配置公钥 $ ssh nagios@nagios-test1 $ nagios@nagios-test1's password: $ cat id_rsa.pub >> .ssh/authorized_keys $ chmod 600 .ssh/authorized_keys 监控软件 $ exit Ø 测试无密码登录 $ ssh nagios@nagios-test1 2) 在远程被监控机器上配置sudo 使nagios用户可以以root身份运行/usr/local/nagios/libexec/eventhandlers/restart-httpd脚本 # visudo 添加如下行: nagios ALL=(root) NOPASSWD:/usr/local/nagios/libexec/eventhandlers/restart-httpd 注释如下行: #Defaults requiretty 3) 在远程机器上编写apache重启脚本
vi /usr/local/nagios/libexec/eventhandlers/restart-httpd 内容如下: #!/bin/sh # # Event handler script for restarting the Apache server on the remote machine # # Note: This script will only restart the Apache server if the service is # retried 2 times (in a "soft" state) or if the web service somehow # manages to fall into a "hard" error state. # # # What state is the Apache service in? nagios实施 case "$1" in OK) ;; WARNING) ;; UNKNOWN) ;; CRITICAL) # Is this a "soft" or a "hard" state? case "$2" in SOFT) # What check attempt are we on? We don't want to restart the Apache server on the first # check, because it may just be a fluke! nagios培训 case "$3" in 2)
|