1/23/2008 Merlin2 Freezing Events

Merlin2 froze twice around the noon hour today. Indications are that it is a hardware issue that shows up under load, but so far no part of the hardware appears to be malfunctioning, and no pertinent events have been logged. This issue will be resolved by the replacement of the Merlin2 hardware in mid-February.

In the meantime, everyone has been advised to copy documents to workstations while they are being worked on, to minimize the chance of data loss.

10/22/2007 – Alexandria freezes and reboots

Beginning on Wednesday, October 17, Alexandria started freezing, sometimes rebooting itself, and sometimes requiring a manually-forced reboot. We don’t know what is causing this behavior. One possible cause is a RAID firmware upgrade recommended by Dell. This was done on October 10, one week before.

There are no log entries or any other indications of what might be wrong.

At this point we are going to watch the system to try and determine the frequency of the incidents. We may wish to schedule a rollback of the RAID firmware.

Serious Ingeniux outage

Today the pages on the main university web server began to fail due to problems with the Ingeniux CMS system. The CMS system itself could not be started to correct this issue. As a result, the contents on the CMS had be restored from backup. Unfortunately, changes made to the website from within the CMS may have been lost. If you have lost content, it may be possible to restore it. Please contact Jean Huskamp at x3773 for more information.

Internet performance issues

We are experiencing reduced round trip times to the internet. This is effecting browsing and downloading of files. I am working with Cisco to find the cause and resolve it as soon as possible. You may notice messages when trying to browse to some websites “unable display page”. We are experiencing unusually heavy data traffic, which has brought these issues to the surface. We will work through the problems and keep you posted.

Cascade Web Unplanned Downtime

Cascade Web came down 3/26 at 4:40pm following a restart of the Apache server to restore the use of Banner and Famis to the camano server which had been running on a backup server since Friday.  Cascade Web was unable to start following this reboot and remained down until 10:30pm. 

Other components of the Application Server including Discoverer and Portal remain down as of 8:15am on 3/27.

February 1st – PureMessage Problems

Due to a problem with an update to the system that was done almost one year ago, the PureMessage quarantine has not been properly expiring, causing a buildup of SPAM messages in the quarantine and an inflation of the PM database. This caused the system to apparently “hold back” certain messages, generally ones that originated from listservs.

In order to correct this issue, we are currently running processes that will properly expire and reindex the quarantine and the metadata in the database. As this proceeds, the held back messages are delivered. Users will see the appearance of old emails in their inboxes. The number of affected messages seems small. Most people are not seeing any old messages appear, but many are. AS stated above, most of the affected messages appear to be from listservs and email subscription services.

6/3 – Database Systems Temporarily Offline

The database systems went offline sometime of 3 June this weekend when rainier and crystal lost connections to several disk volumes. The disks were remounted, and the databases were restarted. The DBA was called at 9 AM this morning.

The www2 webserver also became unresponsive because it lost connection to the database. The webserver was restarted once the database came online, restoring service.

5/30 – License problems with the FTP server on www2

Sometime during the weekend, the ftp server on www2 decided that its license had expired. Until a new license could be obtained, the standard ftp server (wu-ftpd) was run. This led to some slow response (wu-ftpd operates under the xinetd master daemon) for much of the day. The vendor provided a new license key at 4:30 PM. This was installed, and ftp service was restored to normal.