Email Problems

There were reports last night and this morning that user were having problems connecting to webmail. The problem appears to be linked to the Sophos PureMessage software and the timing of the quarantine digest messages and the slow disk array.

We updated our PureMessage configuration to send on digest messages instead of having each server send individual messages. The result being that all the messages hit the mail server and the same time and overwhelmed it with e-mail and client access with the start of the business day.

We are modifying the interval for digest messages from twice a day to once per day and moving the time to 1:30 am each morining.

MERLIN2 Problems

MERLIN2, the administrative file server became unresponsive for unknown reasons, causing workstations connected to it to also become unresponsive.

The server was rebooted, which cleared the problem. Workstations my need to be rebooted as well.

IMAP and POP3 and Webmail Slowness II

Sun has identified some issues with our disk array configuration. They have provided some new settings for us to apply. One of the changes has been made, with little effect on disk performance. The other two changes will require that the email server be shutdown. We are scheduling this right now. Please refer to the “Scheduled Outages” (http://www2.ups.edu/ois/nssg/network/alerts.shtml)page for the latest information.

WeMail Problems I

Today at about 10:00 AM, the HelpDesk reported major failure in WebMail.

We noticed the presence of a large number of processes (about 1000 and growing) on the mail server, and a larger than normal of mail in-queue. WebMail was stopped, server processes on the mail server were stopped, and the mail queues were processed by hand.
Continue reading

SAN Maintenance and New Installations

On Saturday we will restructure the fibre channel fabric by replacing the existing 8 port switch with two 24 port switches—32 total ports enabled. As part of the restructuring, the database systems (Lenel, Rainier, Crystal, and Grace) will be down. The SAN and tape library will also be down during the restructuring.

We will be attempting a variety of hardware related tasks during the down time:

1. Turning the two Dell cabinets 180 degrees to provide better air flow.

2. Reworking the network, fibre, and power cables in the Dell cabinets for better access.

3. Migrating and rezoning all fibre channel connections to the new switches.

4. Installing a new HBA in Rainier and upgrading the PowerPath license.

5. Adding additional systems to the fibre channel fabric (Exchange Servers, AX100 appliance, Veronica, Merlin2 and Alexandria).

We estimate that this will take approximately eight hours. We will be starting at 8:00 am on Saturday, June 11

E-mail problem: 5550 5.3.0 Can’t create output

Some user reported this morning the inability to send messages to users. They received a common error,

Final-Recipient: RFC822; username@ups.edu
X-Actual-Recipient: RFC822; username@ups.edu
Action: failed
Status: 5.3.0
(reason: Can’t create output)

This error was the result of poor deactivation of the quotaing system. The quota system had been turned off, but not removed from the fstab file. When the system was rebooted yesterday, the quota system was re-enabled and locked account in excess of their time limit. This issue has been corrected.