Spam Filter End User Interface Unreliable

Posted on January 9, 2007 by myoung

Due to hardware problems, the spam filter end user interface (the web system where University members check their spam quarantine contents) will be unreliable. Service has been engaged, and we hope to have a resolution today.

6/3 – Database Systems Temporarily Offline

Posted on June 4, 2006 by myoung

The database systems went offline sometime of 3 June this weekend when rainier and crystal lost connections to several disk volumes. The disks were remounted, and the databases were restarted. The DBA was called at 9 AM this morning.

The www2 webserver also became unresponsive because it lost connection to the database. The webserver was restarted once the database came online, restoring service.

5/30 – License problems with the FTP server on www2

Posted on May 30, 2006 by myoung

Sometime during the weekend, the ftp server on www2 decided that its license had expired. Until a new license could be obtained, the standard ftp server (wu-ftpd) was run. This led to some slow response (wu-ftpd operates under the xinetd master daemon) for much of the day. The vendor provided a new license key at 4:30 PM. This was installed, and ftp service was restored to normal.

5/25 – www2 failed

Posted on May 25, 2006 by myoung

The secondary web server www2 became unresponsive when the SurgeFTP service began consuming 99% of the CPU cycles. We were unable to stop the service or gracefully restart the system. The system was powered down, then back on, after which the system ran normally.

5/24 – MERLIN2 failure

Posted on May 24, 2006 by myoung

Merlin2 became unresponsive to fileshare access today at 5:30 PM. The console was still responding, and we were able to log on. Access to the disk arrays appeared to be impaired – we were unable to list the disks or view their contents. No pertinent events were logged in the Event Log. The system was rebooted, and was back to normal.

We have made an adjustment to the antivirus software (changed vendors), and will keep monitoring.

Email Problems

Posted on November 11, 2005 by myoung

There were reports last night and this morning that user were having problems connecting to webmail. The problem appears to be linked to the Sophos PureMessage software and the timing of the quarantine digest messages and the slow disk array.

We updated our PureMessage configuration to send on digest messages instead of having each server send individual messages. The result being that all the messages hit the mail server and the same time and overwhelmed it with e-mail and client access with the start of the business day.

We are modifying the interval for digest messages from twice a day to once per day and moving the time to 1:30 am each morining.

IMAP, POP3, and Webmail Slowness IV

Posted on October 24, 2005 by myoung

The firmware upgrades did not resolve the problem with disk performance, even though we had a good couple of days. Sun’s final analysis led them to the conclusion that the disk array on which mailboxes reside has simply reached a saturation level in terms of I/O rate. This will mean that we will have to get a faster disk array.
Continue reading →

IMAP and POP3 and Webmail Slowness II

Posted on October 19, 2005 by myoung

Sun has identified some issues with our disk array configuration. They have provided some new settings for us to apply. One of the changes has been made, with little effect on disk performance. The other two changes will require that the email server be shutdown. We are scheduling this right now. Please refer to the “Scheduled Outages” (http://www2.ups.edu/ois/nssg/network/alerts.shtml)page for the latest information.

IMAP and POP3 and Webmail Slowness I

Posted on October 18, 2005 by myoung

We are currently experiencing disk performance problems on the mail server. This is causing slowness and problems connecting with Webmail and POP3 and IMAP clients. We are working with the hardware vendor to correct the problem.

PureMessage not accepting messages

Posted on May 23, 2005 by myoung

PureMessage stopped accepting messages this morning when the disk volume was filled by log files. Corrections have been made to the logrotate.conf file in an attempt to prevent this from occurring in the future.

Network upgrade

Posted on December 19, 2004 by myoung

The planned OS upgrade of core network equipment on Sunday was not as smooth as planned. Two systems had difficulty with the upgrade and required a reboot: the mail server, and the Oracle development server. Otherwise, the upgrade was a success.

As a result of the problem, we have a better understanding of the upgrade process for future upgrades.

Dial-in Failures

Posted on November 17, 2004 by myoung

The radius daemon and portmaster were reset to address several reported issues with dial-in access. The core reason for users not being authenticated is not completely clear, but there are indications that the portmaster or the radius daemon became confused about the appropriate share secret. Once all entryies were reset authentication started to be validated correctly.

During the process, four modems were identified as failing to respond properly and were removed from service.

Crystal – problems connecting to SAN

Posted on September 1, 2004 by myoung

DBA reported problems with CRYSTAL on Saturday, 8/28 in the evening.

Examined system on Monday 8/30 system appears to be unable to communicate with SAN. Cleaned fibre, switch and HBA but no luck.

Examined HBA on 8/31 no LED on card. Called vendor support–3.5 hours later the HBA was considered bad and a new HBA was sent out–4 hour delivery. New HBA installed, but still loading SAN drives. Lights are now working on both HBA and switch.

Called back support engineer at 9am left voice message. Called vendor support at 11 am to talk with another engineer and was told our call would be assigned to another engineer. Original engineer called back at 4pm to apologize that another engineer had not been assigned. Dual entries seen in switch, old entry removed and system rebooted and SAN drives are not visible.

Restarted databases.

Media server down

Posted on August 27, 2004 by myoung

A hard disk failure in the RAID array on the media server resulted in the server hanging. After making several attempts to recover the system, the data was backed up and system was rebuilt.

Directory Server Reboot

Posted on June 24, 2004 by myoung

The directory server was rebooted at 5:15pm to finish the OS upgrade.
Server back on-line at 5:25pm.

Puget Sound Technology Services

Service Announcements from Technology Services

Category Archives: Unscheduled Outages