The University’s main web site began returning 500 errors about 8 PM. A restart of the web server software corrected it.
Category Archives: Failures
1/23/2008 Merlin2 Freezing Events
Merlin2 froze twice around the noon hour today. Indications are that it is a hardware issue that shows up under load, but so far no part of the hardware appears to be malfunctioning, and no pertinent events have been logged. This issue will be resolved by the replacement of the Merlin2 hardware in mid-February.
In the meantime, everyone has been advised to copy documents to workstations while they are being worked on, to minimize the chance of data loss.
10/24 Alexandria crashes
I have recorded Alexandria crashes at:
11 AM on Monday 10/22
11:50 AM on Tuesday 10/23
4:25 PM on Tuesday 10/23
1:15 PM on Wednesday 10/24
10/22/2007 – Alexandria freezes and reboots
Beginning on Wednesday, October 17, Alexandria started freezing, sometimes rebooting itself, and sometimes requiring a manually-forced reboot. We don’t know what is causing this behavior. One possible cause is a RAID firmware upgrade recommended by Dell. This was done on October 10, one week before.
There are no log entries or any other indications of what might be wrong.
At this point we are going to watch the system to try and determine the frequency of the incidents. We may wish to schedule a rollback of the RAID firmware.
Serious Ingeniux outage
Today the pages on the main university web server began to fail due to problems with the Ingeniux CMS system. The CMS system itself could not be started to correct this issue. As a result, the contents on the CMS had be restored from backup. Unfortunately, changes made to the website from within the CMS may have been lost. If you have lost content, it may be possible to restore it. Please contact Jean Huskamp at x3773 for more information.
Internet performance issues
We are experiencing reduced round trip times to the internet. This is effecting browsing and downloading of files. I am working with Cisco to find the cause and resolve it as soon as possible. You may notice messages when trying to browse to some websites “unable display page”. We are experiencing unusually heavy data traffic, which has brought these issues to the surface. We will work through the problems and keep you posted.
09/10 Camano Failure
Camano failed due to process and memory overload caused by a failure to properly process bounceback messages from CRM.
7/5/2007 Bug found in redirector application
The redirector web application at http://webmail.ups.edu/redirector was found to fail with a user’s full name containing an apostrophe. Refactoring is underway.
Cascade Web Unplanned Downtime
Cascade Web came down 3/26 at 4:40pm following a restart of the Apache server to restore the use of Banner and Famis to the camano server which had been running on a backup server since Friday. Cascade Web was unable to start following this reboot and remained down until 10:30pm.
Other components of the Application Server including Discoverer and Portal remain down as of 8:15am on 3/27.
February 19th – CMS Publishing failed
Monday morning at 7:30 AM, a publish on the CMS system failed, corrupting the University’s web site. The system was restored after about one hour.
Dependency graphs were rebuilt, and the publish target was copied directly to the live web site to expedite recovery
1/11 Internet Connection Down Briefly Last Night
At approximately midnight last night, the University’s Internet connection was unavailable for 30-40 minutes, because of scheduled maintenance on our ISP’s routers.
6/3 – Database Systems Temporarily Offline
The database systems went offline sometime of 3 June this weekend when rainier and crystal lost connections to several disk volumes. The disks were remounted, and the databases were restarted. The DBA was called at 9 AM this morning.
The www2 webserver also became unresponsive because it lost connection to the database. The webserver was restarted once the database came online, restoring service.
5/30 – License problems with the FTP server on www2
Sometime during the weekend, the ftp server on www2 decided that its license had expired. Until a new license could be obtained, the standard ftp server (wu-ftpd) was run. This led to some slow response (wu-ftpd operates under the xinetd master daemon) for much of the day. The vendor provided a new license key at 4:30 PM. This was installed, and ftp service was restored to normal.