Monitoring and Testing
The Dryad production system is monitored via many services. All test different aspects of the system.
Nagios runs on NESCent systems to verify that major Dryad functionality is in place. When it detects an error, it sends emails and text messages to the appropriate personnel.
Nagios performs these checks:
- the machine is running
- the Dryad home page responds
- searches return results
- frequency of error messages in the Dryad log files
- Number of processes on the host are below a user-defined threshhold
Local configuration parameters for nagios are in /etc/nagios/nrpe.cfg
All Dryad-related Nagios checks (password protected)
Nagios also performs very high-level tests of the non-production systems.
NCSU runs Hyperic HQ on the production server to monitor its internal status (memory, cpu usage, etc.)
The Hyperic system is at https://spectre.lib.ncsu.edu/ (password protected).
Dryad's DNS entries are managed by MCNC. When an issue is detected (i.e., when the homepage doesn't respond), all traffic is rerouted to a secondary server. For details, see the Failover page.
DataONE will run a process to monitor the status of the DataONE interface to Dryad.