Due to a reconfiguration, the University’s antivirus gateway stopped forwarding email. The system was rebooted, and the problem was cleared. A slight email backup occurred, which was cleared in 1-2 hours.
Category Archives: Failures
Loss of MX record
During the implementation of e-mail routing changes on 2/2/04 the MX record for the ups.edu domain on the internal DNS server was inadvertantly disabled.
This problem has been fixed and a backlog of messages, held by the anti-virus gateway are now being delivered.
High volume of messages in AVG queue
A high volume of messages has been noticed in the queues of the anti-virus gateway. The cause of this situation is not clear. Currently message are taking anywhere from 15 minutes to several hours to be delivered. One cause may be the high volume of virus/worm traffic.
Mail service reconfiguration
The University e-mail service has been reconfigured to route as many messages as possible through the anti-virus gateway before they are sent off-campus or recieved by the University’s mail servers.
This additional stop for messages increases the delivery time, but in necessary to reduce the propogation of e-mail viruses and worms.
Shares on new academic file server repaired
File shares on the new ALEXANDRIA academic file server had incorrect share permissions set. This caused users to be locked out of their file space. The permissions were reset to the proper values.
Wounded IP Pool in Union Ave (Greek Row) exhausted
Due to the problem with the DHCP server today, students in Greek Row subnet had problems registering, which caused them to back up in the wounded IP range, exhausting the wounded pool.
The wounded pool was increased in all subnets to relieve the problem.
Resnet DCHP server conf file refresh failed
As on 15 September, 2003, the auto-restart script on the Resnet DHCP server failed. The script was restarted, and this solved the problem.
Cbord Odyssey unresponsive to registers
The Cbord server reported to not be responding to inquiries from the cash registers at approximately 12:30 PST. Upon inspection of the server, it was discovered that pcAnywhere was hung waiting for a connection. We received a VFEP error when tring to access the Odyssey Control panel.
It is our belief that C-Bord support encountered errors when doing routine maintenance that hung the server.
RESOLUTION: We rebooted the Odyssey server and had the cash registers re-inquire.
Corporate Time slow
Serveral users have recently reported slow response from the Corporate Time server. During the course of our investigation of the problem, several orphaned ssh process were found on Big Ben and the log files were found to be quite large.
The orphaned processes were dealt with at 16:00 PST. The Corporate Time service was stopped at approximately 20:30 PST. The log files were rotated and the service restarted. The University web server was restarted to allow users to access the Corporate Time web interface.
Proxy Server Hang
The University’s proxy server hung this morning. the server was restarted, and returned to normal operation.
Internet Connection Instability II
The University’s internet connection was inoperable this morning from about 7 AM until about 10:30 AM. The problem was isolated to the PIX firewall. A number of logged outbound connection attemps (ca. 97000) from forged class A source addresses to a destination in Northern California were discovered. The sheer number of such attempts constituted a denial of service situation for the firewall.
The true source of the connection attempts was traced to a student computer in Todd/Phibbs. The students network port was inactivated, and the firewall was rebooted. This restored internet service.
Continue reading
Internet connection instability I
The University’s Internet connection was unstable from 7 PM to 9:30 PM. We are unsure of the reason. Checked for unusual traffic at the core router, and found none. The traffic analysis graph shows an unusually flat usage level at about 5 Mbps during this the period 7PM to 9PM, then usage dropped to almost 0, and is now recovering.
The internet service provider has been called.
Micros Server Adjustments
The Micros server, which controls the cash registers in the CBORD One-card system, ran out of disk space on the C: drive, causing serveral services to crash. This in turn caused the cash registers to go offline.
The pagefile was split among the C: and D: drives in the folowing manner: 500 MB on C: and 1 GB on D:, and serveral temp files and an old log file were deleted.
The NT backup was moved from 3:30 AM to 2:30 AM, because it was not finishing before opening time for the cafe and diner.
Cascade web interface down
The Cascade web interface was down for a several hours on the morning of October 8, 2003 following the regularly scheduled Information Services preventative maintenance period.
This outage was partially the result of some miscommunication within Information Services, and the improper shutdown of the Oracle Application Server on the effected system, Camano.
The backup server was brought on line around 11:00 am and remained up until late in the afternoon. The issues surrounding the failure are being addressed both within Information Services and with Oracle.
The production instance of the Oracle Application Server issue was resolved, and brought back on-line by the 5:00 pm.
Continue reading
Sendmail configuration files upgraded
The sendmail configuration files on the University’s mail server have been upgraded to version 8.12.10. The early problem with the configuration files and the vacation program have been resolved. The vacation program has also been upgrade to prevent rev-lag.
The sendmail configuration files were also modified to use the currect os_type.