Mail Problems Continue

Our antivirus gateway has continued to function suboptimally. Over the past two days, it has been unable to keep up with the mail load. This is apparently due to the large number of virus-infected messages, and a larger than normal number of arriving messages. As of the time of this entry, we have approximately 75000 messages in queue.

The antivirus gateway has been taken entirely out of the mail flow, so new mail will be delivered in a timely fashion today.
Continue reading

Antivirus Gateway Problems III

All attempts to get the antivirus gateway to process messages faster have failed. Network Associates, the maker of the product, stated that the theoretical limit of throughput for the software was 5-10000 messages per hour, if no messages are infected, and if no other filtering was being done. They were unable to say exactly what was causing what we’re seeing.
Continue reading

Webmail failure

A network interface on HERMIONE, the Webmail server shut off, causing the webmail web server to bind to the wrong interface. This caused Webmail to be unavailable overnight. The interface was re-enabled, and webmail was bound to the correct network interface.

Crystal drive locking

DBA reporting I/O errors on local hard drive causing Oracle instances to abort.

Errors found in /var/adm/messages,

Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):

Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):

—–END EXCERPT—–

9/2 spent researching what might be causing error. At 4pm system is configured with /usr/sbin/lu to create new boot environment to assess the possibility of a drive failure. New boot environment made active, but only 75% of Oracle data on local hard drive is copied. Other data is copied manually and Oracle started. No additional I/O errors or OFFLINE/ONLINE cycling occurred.

9/3 Sun Support contacted to help confirm and replace defective part. Hard drive identified as defective, not drive ordered and delivered. Received instruction from Sun Tech for installation.

9/7 New hardware installed.

Crystal – problems connecting to SAN

DBA reported problems with CRYSTAL on Saturday, 8/28 in the evening.

Examined system on Monday 8/30 system appears to be unable to communicate with SAN. Cleaned fibre, switch and HBA but no luck.

Examined HBA on 8/31 no LED on card. Called vendor support–3.5 hours later the HBA was considered bad and a new HBA was sent out–4 hour delivery. New HBA installed, but still loading SAN drives. Lights are now working on both HBA and switch.

Called back support engineer at 9am left voice message. Called vendor support at 11 am to talk with another engineer and was told our call would be assigned to another engineer. Original engineer called back at 4pm to apologize that another engineer had not been assigned. Dual entries seen in switch, old entry removed and system rebooted and SAN drives are not visible.

Restarted databases.