This status message pertains to this entry. We are halfway through delivering the 75000 emails in queue. Approximately 35000 to go…
Monthly Archives: September 2004
Mail Problems Continue
Our antivirus gateway has continued to function suboptimally. Over the past two days, it has been unable to keep up with the mail load. This is apparently due to the large number of virus-infected messages, and a larger than normal number of arriving messages. As of the time of this entry, we have approximately 75000 messages in queue.
The antivirus gateway has been taken entirely out of the mail flow, so new mail will be delivered in a timely fashion today.
Continue reading
Antivirus Gateway Problems III
All attempts to get the antivirus gateway to process messages faster have failed. Network Associates, the maker of the product, stated that the theoretical limit of throughput for the software was 5-10000 messages per hour, if no messages are infected, and if no other filtering was being done. They were unable to say exactly what was causing what we’re seeing.
Continue reading
Webmail failure
A network interface on HERMIONE, the Webmail server shut off, causing the webmail web server to bind to the wrong interface. This caused Webmail to be unavailable overnight. The interface was re-enabled, and webmail was bound to the correct network interface.
Antivirus Gateway Problems II
The antivirus gateway is having problems keeping up with the number of arriving email messages. The IN queue is increasing constantly.
Antivirus Gateway Failed
The antivirus gateway failed. The server was rebooted to clear the problem.
Continue reading
New Secondary DNS server
Loanshark-slave has been decommissioned as the secondary external DNS server. We have replaced it with a newer server that has been assigned the same IP address for convenience.
RADIUS server moved
The RADIUS server has been moved to a new server with the same IP address as before, so no changes to services using RADIUS should be required. The client file was copied and test to ensure proper functioning of the server.
Google Search Appliance Replaced
The Google search appliance was replaced with a new version this morning. The new appliance is running version 4.o of the Google software.
Crystal drive locking
DBA reporting I/O errors on local hard drive causing Oracle instances to abort.
Errors found in /var/adm/messages,
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
—–END EXCERPT—–
9/2 spent researching what might be causing error. At 4pm system is configured with /usr/sbin/lu to create new boot environment to assess the possibility of a drive failure. New boot environment made active, but only 75% of Oracle data on local hard drive is copied. Other data is copied manually and Oracle started. No additional I/O errors or OFFLINE/ONLINE cycling occurred.
9/3 Sun Support contacted to help confirm and replace defective part. Hard drive identified as defective, not drive ordered and delivered. Received instruction from Sun Tech for installation.
9/7 New hardware installed.
Crystal – problems connecting to SAN
DBA reported problems with CRYSTAL on Saturday, 8/28 in the evening.
Examined system on Monday 8/30 system appears to be unable to communicate with SAN. Cleaned fibre, switch and HBA but no luck.
Examined HBA on 8/31 no LED on card. Called vendor support–3.5 hours later the HBA was considered bad and a new HBA was sent out–4 hour delivery. New HBA installed, but still loading SAN drives. Lights are now working on both HBA and switch.
Called back support engineer at 9am left voice message. Called vendor support at 11 am to talk with another engineer and was told our call would be assigned to another engineer. Original engineer called back at 4pm to apologize that another engineer had not been assigned. Dual entries seen in switch, old entry removed and system rebooted and SAN drives are not visible.
Restarted databases.