Mail Problems Continue

Our antivirus gateway has continued to function suboptimally. Over the past two days, it has been unable to keep up with the mail load. This is apparently due to the large number of virus-infected messages, and a larger than normal number of arriving messages. As of the time of this entry, we have approximately 75000 messages in queue.

The antivirus gateway has been taken entirely out of the mail flow, so new mail will be delivered in a timely fashion today.
Continue reading

Antivirus Gateway Problems III

All attempts to get the antivirus gateway to process messages faster have failed. Network Associates, the maker of the product, stated that the theoretical limit of throughput for the software was 5-10000 messages per hour, if no messages are infected, and if no other filtering was being done. They were unable to say exactly what was causing what we’re seeing.
Continue reading

Webmail failure

A network interface on HERMIONE, the Webmail server shut off, causing the webmail web server to bind to the wrong interface. This caused Webmail to be unavailable overnight. The interface was re-enabled, and webmail was bound to the correct network interface.

Crystal drive locking

DBA reporting I/O errors on local hard drive causing Oracle instances to abort.

Errors found in /var/adm/messages,

Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):

Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 16:31:01 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:52 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 17:51:58 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 17:52:55 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 17:52:55 crystal scsi: [ID 107833 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfc95ddf,0 (ssd1):
Sep 1 19:29:31 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 19:29:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 19:30:10 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):
Sep 1 20:21:32 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Sep 1 20:21:33 crystal qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Sep 1 20:22:05 crystal scsi: [ID 243001 kern.warning] WARNING: /pci@9,600000/SUNW,qlc@2/fp@0,0 (fcp0):

—–END EXCERPT—–

9/2 spent researching what might be causing error. At 4pm system is configured with /usr/sbin/lu to create new boot environment to assess the possibility of a drive failure. New boot environment made active, but only 75% of Oracle data on local hard drive is copied. Other data is copied manually and Oracle started. No additional I/O errors or OFFLINE/ONLINE cycling occurred.

9/3 Sun Support contacted to help confirm and replace defective part. Hard drive identified as defective, not drive ordered and delivered. Received instruction from Sun Tech for installation.

9/7 New hardware installed.

Crystal – problems connecting to SAN

DBA reported problems with CRYSTAL on Saturday, 8/28 in the evening.

Examined system on Monday 8/30 system appears to be unable to communicate with SAN. Cleaned fibre, switch and HBA but no luck.

Examined HBA on 8/31 no LED on card. Called vendor support–3.5 hours later the HBA was considered bad and a new HBA was sent out–4 hour delivery. New HBA installed, but still loading SAN drives. Lights are now working on both HBA and switch.

Called back support engineer at 9am left voice message. Called vendor support at 11 am to talk with another engineer and was told our call would be assigned to another engineer. Original engineer called back at 4pm to apologize that another engineer had not been assigned. Dual entries seen in switch, old entry removed and system rebooted and SAN drives are not visible.

Restarted databases.

MX record change

The MX record on DNS zones was changed to mx00.ups.edu and mx01.ups.edu in an effort to normalize the naming convention for our mail exchange servers. This change resulted in some mail delivery problems since not also external mail servers picked up the change in a timely manner. A workaround was implemented to allow mail delivery to continue. Mail messages sent between 10:00am and 11:45am (-8:00 PST) seem to have been effected.

Antivirus gateway – unique headers causing crashes

This morning the McAfee WebShield MailScan SMTP service was found to be unaccountably stopped. The service could be restarted, but would not stay running for longer than 10 minutes. After calling McAfee service, we applied the latest patch set, which had no effect.

After the service consultant mentioned some problems with corrupt SMTP headers, we found several messages in German similar to the following:

——————————————————————————————-
Received: From hcjogg.pe ([200.121.127.144]) by gehenna.windows.ups.edu (WebShield SMTP v4.5 MR1a P0803.345);
id 1086899371171; Thu, 10 Jun 2004 13:29:31 -0700
From: tmatrix@t-matrix.com.pe
To: sleith@ups.edu
Date: Thu, 10 Jun 2004 19:48:18 GMT
MIME-Version: 1.0
Subject: Libanesen in Berlin [Key:4463]
Importance: Normal
X-Priority: 3 (Normal)
Message-ID:
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=”us-ascii”

Habe eben im Fernsehen einen Bericht gesehen, in dem klar hervorging, dass libenesiche und kurdische Moslems in Berlin die Drogenszene und teilweise sogar das Rotlicht-Milieu beherrschen. Der Clou an der Geschichte ist jedoch, dass die Libanesen, die in kriminelle Aktivitaeten verwickelt und Millionen scheffeln, ebenfalls vor dem Sozialamt erscheinen um ihre Sozialhilfe einzufordern. In einigen Szenen im Berliner Sozialamt, konnte man vor lauter Kopftuecher nicht einmal mehr die Waende sehen!Wenn man da bedenkt, dass die Gruenen sich im Moment gegen ein strengeres Immigrations-und Asylgesetz querlegen, kann man nur noch mit Hoffnungslosigkeit und Kopfschuetteln reagieren. Interessant an dem Bericht war ebenfalls, dass Justiz und Politik bisher nicht mit der gebotenen Staerke gegen solche Umtriebe vorgegangen ist, aus Angst man koennte ihnen Auslaenderfeindlichkeit und Rechtslastigkeit vorwerfen! Die Auswirkungen von soviel Dummheit und falscher Toleranz werden wir noch alle bitter bereuen muessen!

——————————————————————————————–

These messages were removed from the AVG SMTP queue, and the SMTP service operated properly.

McAfee service has been given exemplars of the offending messages. We have blocked the relevant IP addresses in the Firewall.

Lenel video server rebooted

Security reported problems viewing the Lenel security cameras at approximately 11:30am. The camera windows would appear to open, but there was a problem with connecting to the server. The video server was rebooted at approximately 12:45pm. This corrected the problem.

Blackboard 6.1 Problems I

There have been intermittent problems with the new Blackboard 6.1 server, starting this past weekend. At random intervals the server returns a “500 Internal Server Error”. This can happen on any page.

Blackboard service has been called, and is looking into the problem now.