The mailer functionality serving Cascade and CRM has been migrated off of Camano, and onto Orcas. This involved a DNS change for cascademail.pugetsound.edu and a small amount of code change in Cascade.
A database error occurred in CRM about 11:30 am on 4/15/2009. DST was alerted to the problem, a table that was unable to extend, and fixed it about noon. A couple of email campaigns were in-progress and were adversely affected:
1. A message from the President’s office going to faculty, staff and students was sent out twice, but the records indicated it only went out once.
2. A message that was being created by Admission was in the middle of generating the target group and got stuck there. Every attempt to resolve it failed so the solution was to copy the schedule without the target group, re-create the target group and then the email was sent out successfully.
[Update 3/19/09 4:20 PM] Services have been restored. The application services were stopped and restarted. Root cause analysis in underway.
The application server that hosts Cascade is currently unavailable. Other services impacted are Cascade, Portal, Discoverer, CRM, and Views Flash Survey. TS is aware of the problem and it is being investigated.
The Installation of the new storage array and SAN to support upgrades of the database and email systems was completed with the reboots of all database servers from 5:30 to 6:30 AM this morning. CRM was restored at 7:45 AM. The extended timeframe was necessary to ensure that all existing and new LUNs had redundant paths to their hosts in the SAN.
Saturday, October 25, Technology Services will be performing important updates on our database systems and storage area network. This will impact our database and email systems. In order to minimize service disruption, there will be two groups of outages:
7 AM until 5 PM – Cascade Web, Banner, Famis, Basis, and the CRM will be unavailable.
7:30 AM until 9:00 AM – Email will be unavailable.
Network files and shares, and other network services will not be impacted.
CRM fulfillment experienced an error during the processing of an email (rollback segment problem). Restarting the email caused a runaway cpu-intensive session even though the email was subsequently cancelled.
Cause of runaway session: user created content using the wrong content type (admission content type instead of campus email), which relied on admission data being in the system.
Resolution: We bounced the application server and killed the runaway session. User recreated email with correct content type.
Service has been restored to all central database systems (Cascade, Cascade Web, Banner, Famis, Millennium, Basis and CRM) as of 3pm. Please contact the Help Desk, 253.879.8585 or firstname.lastname@example.org, if you continue to experience problems.
All central database systems are unavailable. This service disruption began at 11AM today and services impacted include: Cascade, Cascade Web, Banner, Famis, Basis, CRM and Millennium. Technology Services has determined the source of the disruption to be a power outage and is working with Facilities Services to resolve as quickly as possible.
UPDATE AS OF 2PM:
Stable power has been restored and Technology Services is now focused on restarting and verifying the database systems. Every effort is being made to restore services as quickly as possible. Our earliest continuation of service is estimated at 3pm today.
At 6 AM this morning, a reload of the core switch caused a failure of the network interfaces on the Cascade Web server. This in turn caused the Cascade web interface to be unavailable from 6 to 8:15 this morning. The server was rebooted and recovered, restoring service at approximately 8:15.
Other affected servers were grace, camano, and crystal, all of which were rebooted to restore service at 7:30 AM. Batch job processing was affected between 6 and 7:30 AM.