CRM Down 1/19

CRM experienced periodic outages the morning of January 19 through 1:30 pm.  Information Services was notified of the problem at approximately 11:00 am and began troubleshooting.  The cause was identified at 12:15 pm and a resolution implemented by approximately 1:15 pm.  The system was fully functional by 2:00 pm.

The cause of the failure was due to uncompiled objects in the database, but the root issue that caused the objects to become invalid is unknown.

Š

6/3 – Database Systems Temporarily Offline

The database systems went offline sometime of 3 June this weekend when rainier and crystal lost connections to several disk volumes. The disks were remounted, and the databases were restarted. The DBA was called at 9 AM this morning.

The www2 webserver also became unresponsive because it lost connection to the database. The webserver was restarted once the database came online, restoring service.

5/30 – License problems with the FTP server on www2

Sometime during the weekend, the ftp server on www2 decided that its license had expired. Until a new license could be obtained, the standard ftp server (wu-ftpd) was run. This led to some slow response (wu-ftpd operates under the xinetd master daemon) for much of the day. The vendor provided a new license key at 4:30 PM. This was installed, and ftp service was restored to normal.

5/24 – MERLIN2 failure

Merlin2 became unresponsive to fileshare access today at 5:30 PM. The console was still responding, and we were able to log on. Access to the disk arrays appeared to be impaired – we were unable to list the disks or view their contents. No pertinent events were logged in the Event Log. The system was rebooted, and was back to normal.

We have made an adjustment to the antivirus software (changed vendors), and will keep monitoring.