A network interface on HERMIONE, the Webmail server shut off, causing the webmail web server to bind to the wrong interface. This caused Webmail to be unavailable overnight. The interface was re-enabled, and webmail was bound to the correct network interface.
Author Archives: myoung
Antivirus Gateway Problems II
The antivirus gateway is having problems keeping up with the number of arriving email messages. The IN queue is increasing constantly.
Antivirus Gateway Failed
The antivirus gateway failed. The server was rebooted to clear the problem.
Continue reading
New Secondary DNS server
Loanshark-slave has been decommissioned as the secondary external DNS server. We have replaced it with a newer server that has been assigned the same IP address for convenience.
RADIUS server moved
The RADIUS server has been moved to a new server with the same IP address as before, so no changes to services using RADIUS should be required. The client file was copied and test to ensure proper functioning of the server.
Google Search Appliance Replaced
The Google search appliance was replaced with a new version this morning. The new appliance is running version 4.o of the Google software.
Crystal drive locking
DBA reporting I/O errors on local hard drive causing Oracle instances to abort.
Errors found in /var/adm/messages,
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
—–END EXCERPT—–
9/2 spent researching what might be causing error. At 4pm system is configured with /usr/sbin/lu to create new boot environment to assess the possibility of a drive failure. New boot environment made active, but only 75% of Oracle data on local hard drive is copied. Other data is copied manually and Oracle started. No additional I/O errors or OFFLINE/ONLINE cycling occurred.
9/3 Sun Support contacted to help confirm and replace defective part. Hard drive identified as defective, not drive ordered and delivered. Received instruction from Sun Tech for installation.
9/7 New hardware installed.
Crystal – problems connecting to SAN
DBA reported problems with CRYSTAL on Saturday, 8/28 in the evening.
Examined system on Monday 8/30 system appears to be unable to communicate with SAN. Cleaned fibre, switch and HBA but no luck.
Examined HBA on 8/31 no LED on card. Called vendor support–3.5 hours later the HBA was considered bad and a new HBA was sent out–4 hour delivery. New HBA installed, but still loading SAN drives. Lights are now working on both HBA and switch.
Called back support engineer at 9am left voice message. Called vendor support at 11 am to talk with another engineer and was told our call would be assigned to another engineer. Original engineer called back at 4pm to apologize that another engineer had not been assigned. Dual entries seen in switch, old entry removed and system rebooted and SAN drives are not visible.
Restarted databases.
Web Server emergency maintenance
The University web server will be down during the morning of August 31, 2004 for some emergency maintenance related to a hard disk failure.
Media server down
A hard disk failure in the RAID array on the media server resulted in the server hanging. After making several attempts to recover the system, the data was backed up and system was rebuilt.
Added database to mySQL
I’ve installed PHPCollab on the web server to use in the OIS website redesign. This is the database for the script.
Deleted /aca website
Ron Albertson and I discussed the difficulty in keeping the old /aca website live on the web server. As a result, I deleted the the site. I archived it first and will send him and Jack Roundy a copy on CD.
University main pages now load from root directory
On Monday I added some new profiles. After talking things over with Barbara I took the opportunity to change the main pages location so that they load now from the root directory. The index page in both /root and /external_homes are identical so no links will be broken.
Is that an expression of relief I hear across campus?
Galaxy Backup System Upgraded to Service Pack 3
In order to solve a problem with the Galaxy job controller, which was causing backup jobs to run erratically, Service Pack 3 for CommVault Galaxy 5.0 was applied to the CommServer (lenel), and to the iData agents on crystal and rainier.
Blackboard upgrade
Blackboard has been upgraded from version 6.1.0 to version 6.1.5 with Application Pack. The upgrade was completed with only a short 1-minute interruption of service.