The RADIUS server has been moved to a new server with the same IP address as before, so no changes to services using RADIUS should be required. The client file was copied and test to ensure proper functioning of the server.
Google Search Appliance Replaced
The Google search appliance was replaced with a new version this morning. The new appliance is running version 4.o of the Google software.
Crystal drive locking
DBA reporting I/O errors on local hard drive causing Oracle instances to abort.
Errors found in /var/adm/messages,
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
—–END EXCERPT—–
9/2 spent researching what might be causing error. At 4pm system is configured with /usr/sbin/lu to create new boot environment to assess the possibility of a drive failure. New boot environment made active, but only 75% of Oracle data on local hard drive is copied. Other data is copied manually and Oracle started. No additional I/O errors or OFFLINE/ONLINE cycling occurred.
9/3 Sun Support contacted to help confirm and replace defective part. Hard drive identified as defective, not drive ordered and delivered. Received instruction from Sun Tech for installation.
9/7 New hardware installed.
Crystal – problems connecting to SAN
DBA reported problems with CRYSTAL on Saturday, 8/28 in the evening.
Examined system on Monday 8/30 system appears to be unable to communicate with SAN. Cleaned fibre, switch and HBA but no luck.
Examined HBA on 8/31 no LED on card. Called vendor support–3.5 hours later the HBA was considered bad and a new HBA was sent out–4 hour delivery. New HBA installed, but still loading SAN drives. Lights are now working on both HBA and switch.
Called back support engineer at 9am left voice message. Called vendor support at 11 am to talk with another engineer and was told our call would be assigned to another engineer. Original engineer called back at 4pm to apologize that another engineer had not been assigned. Dual entries seen in switch, old entry removed and system rebooted and SAN drives are not visible.
Restarted databases.
Web Server emergency maintenance
The University web server will be down during the morning of August 31, 2004 for some emergency maintenance related to a hard disk failure.
Media server down
A hard disk failure in the RAID array on the media server resulted in the server hanging. After making several attempts to recover the system, the data was backed up and system was rebuilt.
Added database to mySQL
I’ve installed PHPCollab on the web server to use in the OIS website redesign. This is the database for the script.
Deleted /aca website
Ron Albertson and I discussed the difficulty in keeping the old /aca website live on the web server. As a result, I deleted the the site. I archived it first and will send him and Jack Roundy a copy on CD.
University main pages now load from root directory
On Monday I added some new profiles. After talking things over with Barbara I took the opportunity to change the main pages location so that they load now from the root directory. The index page in both /root and /external_homes are identical so no links will be broken.
Is that an expression of relief I hear across campus?
Galaxy Backup System Upgraded to Service Pack 3
In order to solve a problem with the Galaxy job controller, which was causing backup jobs to run erratically, Service Pack 3 for CommVault Galaxy 5.0 was applied to the CommServer (lenel), and to the iData agents on crystal and rainier.
Blackboard upgrade
Blackboard has been upgraded from version 6.1.0 to version 6.1.5 with Application Pack. The upgrade was completed with only a short 1-minute interruption of service.
Web Server Upgrades Scheduled
On July 13th, the University web server will be upgraded to address several security vulnerabilities. The upgrades will take place between 4:00 PM and 7 PM. The components involved are:
- the web server software itself
- the PHP scripting engine
- the application server that will power the new Ingeniux Content Management System
A brief (ca. 10 second) outage may occur between 6:45 and 7:00 PM. Otherwise, no interruption in service should occur.
MX record change
The MX record on DNS zones was changed to mx00.ups.edu and mx01.ups.edu in an effort to normalize the naming convention for our mail exchange servers. This change resulted in some mail delivery problems since not also external mail servers picked up the change in a timely manner. A workaround was implemented to allow mail delivery to continue. Mail messages sent between 10:00am and 11:45am (-8:00 PST) seem to have been effected.
Faster Internet Connection
The University doubled the bandwidth (speed) of its Internet connection from 12 to 24 megabits per second (mbps) on June 30, 2004. The cost for the new connection is only slightly more than the old connection. The unit cost of the connection has dropped almost by half because many carriers overbuilt their networks and are loaded with excess capacity.
Directory Server Reboot
The directory server was rebooted at 5:15pm to finish the OS upgrade.
Server back on-line at 5:25pm.