1 |
|
2 |
Instructions for how to set up the watchdog daemon to work with IPMI's hardware watchdog |
3 |
---------------------------------------------------------------------------------------- |
4 |
|
5 |
First, verify that the ipmitool utility is present on the system to allow |
6 |
the watchdog timer to be turned off via the command line (which ipmitool). |
7 |
This will allow the hardware watchdog timer to be turned off gracefully |
8 |
should it ever become necessary. If ipmitool is not present, install |
9 |
it or download the latest version from http://ipmitool.sourceforge.net and |
10 |
build and install it on your system. |
11 |
|
12 |
Next, prior to starting up the watchdog daemon, the BMC BIOS should be set |
13 |
to enable the IPMI/BMC hardware watchdog timer, the OpenIPMI watchdog driver |
14 |
module should be inserted with the desired configuration/startup settings, |
15 |
and the watchdog daemon's configuration file should be modified to use /dev/watchdog: |
16 |
|
17 |
1. To setup the IPMI/BMC BIOS to enable the hardware watchdog |
18 |
timer, see BMC documentation. The main settings in the BMC BIOS |
19 |
requiring modification to turn on the IPMI watchdog timer are: |
20 |
|
21 |
- Set the BMC POST Watchdog to "ENABLED". |
22 |
- Set the BMC POST Watchdog Timeout to "5 Minutes". |
23 |
|
24 |
2. To insert the OpenIPMI watchdog driver module with the |
25 |
desired configuration settings, two steps are necessary: |
26 |
|
27 |
i.) Configure the OpenIPMI watchdog driver by editing the |
28 |
/etc/sysconfig/ipmi configuration file: |
29 |
|
30 |
- Set "IPMI_WATCHDOG=yes". |
31 |
- Set desired options via the IPMI_WATCHDOG_OPTIONS |
32 |
config entry. |
33 |
|
34 |
EXAMPLE: 'IPMI_WATCHDOG_OPTIONS="timeout=60 start_now=1 \ |
35 |
preop=preop_give_data action=power_cycle pretimeout=1" ' |
36 |
|
37 |
Execute "modinfo ipmi_watchdog" for more detailed information |
38 |
on the available ipmi watchdog timer options. |
39 |
|
40 |
- Execute "service ipmi start" (the watchdog driver starts |
41 |
automatically along with the other ipmi drivers). |
42 |
|
43 |
IMPORTANT: If "start_now=1" has been set as one of the |
44 |
configuration options, be sure to start up the watchdog |
45 |
daemon before the BMC timer expires! |
46 |
|
47 |
ii.) Set the OpenIPMI daemon and watchdog to start during bootup: |
48 |
|
49 |
- chkconfig ipmi on |
50 |
- chkconfig watchdog on |
51 |
|
52 |
|
53 |
3. Configure the watchdog daemon by editing the |
54 |
/etc/watchdog.conf configuration file: |
55 |
|
56 |
- Uncomment the "watchdog-device = /dev/watchdog" line. |
57 |
- Ensure that "realtime = yes" and "priority = 1" are set and not |
58 |
commented-out. |
59 |
- Uncomment the "interval" line, and set the interval to be less |
60 |
than what you set the timeout option to be in the /etc/sysconfig/ipmi |
61 |
file (ex "timeout=60" so you might set interval to 50). |
62 |
|
63 |
So in the example described herein, the BMC BIOS setting is in |
64 |
minutes (5), and the "interval" and ipmi_watchdog "timeout" settings |
65 |
are both in seconds (50 and 60 respectively). Therefore, the BMC |
66 |
hardware watchdog timer is set to expire and trigger a system power |
67 |
cycle unless reset by the watchdog daemon within 5 minutes, and the |
68 |
watchdog daemon will reset the timer every 60 seconds. |
69 |
|
70 |
|
71 |
4. Start the Watchdog daemon: |
72 |
|
73 |
- execute "service watchdog start" |
74 |
|
75 |
|
76 |
IMPORTANT: To gracefully stop/kill the watchdog daemon, be sure |
77 |
to use "service watchdog stop" (which executes "kill -s SIGTERM <pid>") |
78 |
and do *not* use "kill -9 <pid>". Using "kill -9 <pid>" will cause the |
79 |
daemon to be shut off without stopping the BMC's watchdog timer, thus |
80 |
a system reboot will be triggered when the BMC's watchdog timer expires. |
81 |
|
82 |
Alternately, or in case the watchdog daemon is killed "ungracefully", |
83 |
you can stop the BMC timer by executing the following ipmitool utility |
84 |
command before the watchdog timer expires: |
85 |
|
86 |
# ipmitool -v raw 0x06 0x24 0x04 0x01 0x00 0x10 0x00 0x0a |
87 |
|
88 |
---------------------------------------------------------------------- |
89 |
|
90 |
To test the watchdog after system configuration and setup: |
91 |
|
92 |
. Use kill -9 on the watchdog daemon so it doesn't shut down the watchdog daemon |
93 |
gracefully. Verify that the system gets reset after the BMC timer expires. |
94 |
|
95 |
. Use "service watchdog stop" and verify that the watchdog daemon shuts off |
96 |
the BMC watchdog timer gracefully (the system doesn't get reset). |
97 |
|
98 |
. Set the timer on the watchdog daemon to be greater than the time set in |
99 |
the BMC BIOS for system reset and verify that the system is reset. |
100 |
|
101 |
. Set the timer on the daemon to be less than the time set in the |
102 |
BMC timer and verify that the BMC watchdog is poked regularly and the |
103 |
system is not reset. |
104 |
|
105 |
. Test some of the other actions the BMC can take when the watchdog timer |
106 |
goes off (see modinfo ipmi_watchdog for some other settings to try). |
107 |
|