Trimble Hall Network Malfunction IV

The trouble continues in Trimble Hall. A malfunction occurred with the same symptoms. We have isolated the problem to the switch in Trimble 007. We disconnected the TH network piece by piece while monitoring connectivity with ping. When the 007 switch was disconnected, connectivity returned. Upon reconnecting the 007 switch, the network maintained connectivity. We will have to wait until the next failure to isolate the trouble to a specific port on that switch.

Database Servers dropping connections, losing packets

The database servers, tahoma & baker, and the frontend forms servers snidley, dudley, and fenwick, began dropping persistent client Oracle and telnet connections. ICMP packets were also lost.

Speed and duplex settings on the switch ports to which these servers were connected were checked and found to be correct (100 Mbps, full duplex).

The problem cleared up after about 1 hour. Will check further.

Clock craziness with GANESH

The clock on GANESH went crazy. Apparently this is due to a faulty NTP server somewhere out there, which causes a loop when contacted. We are reconfiguring all our NTP clients to look at a known subset of NTP servers. GANESH was rebooted to solve the immediate problem.

Best Access System Hung

The B.A.S.I.S. software on KETRON hung. A symptom of this was that none of the B.A.S.I.S. applications on workstations would launch. This included the System Administration and the Alarm Monitoring apps. KETRON was rebooted to clear the problem. Conversations with Lance Holloway indicated that the system should be rebooted more frequently than it is:

Lance:
“I’m not as familiar with the reboot requirements of Win 2k as I was with NT. I know that NT could go maximum about two weeks before memory issues took over. Depending on the application. We’ve seen other sites lock up after a while if they have screens open for event stacks etc, and not be rebooted.”

ACADEMIA Domain problems

Problems with domain logons in ACADEMIA prompted a reboot of the controller for that domain.

Concurrently, the clock on ALEXANDRIA ran wild, apparently due to a problem with the index server on that machine. ALEXANDRIA was rebooted, and the problem went away.