Nagios的安装与配置（3）

monicazhang

本帖最后由 monicazhang 于 2015-10-30 21:40 编辑

20151030淡然

续上

define host {

use host-pnp

host_name nagios-test1

alias nagios test1

address 172.26.188.202

contact_groups admins

check_command check-host-alive

max_check_attempts 5

notification_interval 10

notification_period 24x7

notification_options d,u,r

}

修改services.cfg

services.cfg定义被监控机的监控项，配置如下：

define service {

use srv-pnp

host_name nagios-test1

service_description check_tcp 80

check_period 24x7

max_check_attempts 4

normal_check_interval 3

retry_check_interval 2

contact_groups admins

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

event_handler_enabled 1

event_handler restart-httpd

check_command check_tcp!80 nagios安装

}

被监控机配置被监控机安装plugin

#groupadd nagios

#useradd -g nagios -d /usr/local/nagios nagios

#tar –zxvf nagios-plugins-1.4.16.tar.gz

#cd nagios-plugins-1.4.16

#./configure --with-nagios-user=nagios --with-nagios-group=nagios

# make

# make install

被监控机安装nrpe

#tar -zxvf nrpe-2.13.tar.gz

#cd nrpe-2.13

#./configure --prefix=/usr/local/nagios

#make all

#make install-plugin

#make install-daemon

#make install-daemon-config

#chown -R nagios:nagios /usr/local/nagios

将nrpe加入xinetd启动

#vi /etc/xinetd.d/nrpe

service nrpe

{

flags = REUSE

socket_type = stream

port = 5666

wait = no

user = nagios

group = nagios

server = /usr/local/nagios/bin/nrpe

server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd 开源监控软件

log_on_failure += USERID

disable = no

only_from = 127.0.0.1 172.26.188.201

}

在/etc/services对应位置加入如下行：

nrpe 5666/tcp # NRPE

启动nrpe

service xinetd restart

配置

被监控机主要修改了以下这几个文件

/usr/local/nagios/etc/nrpe.cfg

/etc/hosts.allow

/etc/rc.local

1) 配置 NRPE

首先查看/usr/local/nagios/libexec文件夹下面是否有监控脚本命令，这些脚本既为nrpe.cfg中定义命令需要调用的脚本

vi /usr/local/nagios/etc/nrpe.cfg

allowed_hosts=172.26.188.201 #监控机IP

server_address=172.26.188.202 #本机IP

在nrpe.cfg中添加自定义的命令

以下条命令为例

Command[check_sdb2]=/usr/local/nagios/libexec/check_disk–w 20% -c 10% -p /dev/sdb2

监测硬盘使用量，check_sdb2为自定义名，check_disk为调用的脚本，-w是警告，剩余20%容量时，-c，10%为严重值，-p指定要监控的硬盘

2) 修改/etc/hosts.allow增加监控机ip

echo 'nrpe:监控机ip ' >> /etc/hosts.allow

3) 启动 NRPE 守护进程：

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg–d

4) 设置开机自动启动NRPE。

echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >> /etc/rc.local

查看相应的端口：netstat -an |grep 5666

你会发现5666端口已开启。

要点：在添加被监控机时，首先要确定被监控机nrpe.cfg文件里有添加此命令行，前提是在被监控机的/usr/local/nagios/libexec文件夹下面有相应的监控命令的脚本；其次在监控机的command.cfg文件配置里也要有相应的配置，既依照文件里面nrpe的定义文件来添加相应的监控命令；最后是监控机的services.cfg文件中定义具体的监控项。

nagios事件自动处理nagios通过SSH执行event_hander

nagios使用event handlers来在任何人收到通知之前由Nagios做一些前期故障修复。 nagios配置

事件处理命令可以用shell或是perl脚本，脚本中应该处理以下宏：
对服务的：$SERVICESTATE$、$SERVICESTATETYPE$和$SERVICEATTEMPT$；
对主机的：$HOSTSTATE$、$HOSTSTATETYPE$和$HOSTATTEMPT$。
脚本须检测这些作为命令行参数传入的值，并采取必要动作来处理这些值。

下面以nagios监控apache服务，当apache停止时自动启动apache。详细配置步骤如下:

1) 配置在Nagios监控机上无密码登录远程被监控机

Ø 生成ssh密钥文件

#su - nagios

$ ssh-keygen -t rsa

# 下面一直回车，不要设置密码

Generating public/private rsa key pair.

Enter file in which to save the key (/home/nagios/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/nagios/.ssh/id_rsa.

Your public key has been saved in /home/nagios/.ssh/id_rsa.pub.

The key fingerprint is:

d2:82:61:12:53:f9:53:75:77:8d:32:c0:ca:c8:20:60 nagios@nagios.itech.com

Ø 将生成的密钥拷贝到要远程被监控主机上

$ scp .ssh/id_rsa.pub nagios-test1:/usr/local/nagios/

Ø 在要远程登录的被监控机器上配置公钥

$ ssh nagios@nagios-test1

$ nagios@nagios-test1's password:

$ cat id_rsa.pub >> .ssh/authorized_keys

$ chmod 600 .ssh/authorized_keys 监控软件

$ exit

Ø 测试无密码登录

$ ssh nagios@nagios-test1

2) 在远程被监控机器上配置sudo

使nagios用户可以以root身份运行/usr/local/nagios/libexec/eventhandlers/restart-httpd脚本

# visudo

添加如下行：

nagios ALL=(root) NOPASSWD:/usr/local/nagios/libexec/eventhandlers/restart-httpd

注释如下行：

#Defaults requiretty

3) 在远程机器上编写apache重启脚本
vi /usr/local/nagios/libexec/eventhandlers/restart-httpd

内容如下:

#!/bin/sh

#

# Event handler script for restarting the Apache server on the remote machine

#

# Note: This script will only restart the Apache server if the service is

# retried 2 times (in a "soft" state) or if the web service somehow

# manages to fall into a "hard" error state.

#

# What state is the Apache service in? nagios实施

case "$1" in

OK)

;;

WARNING)

;;

UNKNOWN)

;;

CRITICAL)

# Is this a "soft" or a "hard" state?

case "$2" in

SOFT)

# What check attempt are we on? We don't want to restart the Apache server on the first

# check, because it may just be a fluke! nagios培训

case "$3" in

2)

待续：http://ITIL-foundation.cn/thread-53045-1-1.html
本帖关键字:Nagios

上一篇：Nagios的安装与配置（2）
下一篇：Nagios的安装与配置（4）

Nagios的安装与配置（3）

评论