Changes

Jump to navigation Jump to search
Line 46: Line 46:  
== Other Points of Note ==
 
== Other Points of Note ==
 
After the initial restoration of services, the following days raised several issues that required manual intervention to fully restore service:
 
After the initial restoration of services, the following days raised several issues that required manual intervention to fully restore service:
When the power failed in the IT Services Berrick Saul Datacenter, our Virtual Machine, uryrrod, was automatically powered on in the TFTA Datacenter. However, due to it having no network access and an NFS mount in its fstab, the server was found to be sitting in Single-User Mode waiting for administrator intervention when it was next checked on Sunday evening.
+
* When the power failed in the IT Services Berrick Saul Datacenter, our Virtual Machine, uryrrod, was automatically powered on in the TFTA Datacenter. However, due to it having no network access and an NFS mount in its fstab, the server was found to be sitting in Single-User Mode waiting for administrator intervention when it was next checked on Sunday evening.
The Stores UPS (the main backup supply for our servers), has now successfully acknowledged that it has 6 minutes of runtime, not 3-4 (of course, we know from the experience of this event it is easily 15).
+
* The Stores UPS (the main backup supply for our servers), has now successfully acknowledged that it has 6 minutes of runtime, not 3-4 (of course, we know from the experience of this event it is easily 15).
When the scheduled backup processes ran in the early hours of Monday morning, the backup system failed to detect that the network mountpoints were not set up on uryfs1. The system started to back up to its own root filesystem until all space was consumed, at which point services started to fail overnight. Luckily, Icecast was not an affected service, and successfully fell back to a static jukebox feed when it detected that Liquidsoap had stopped responding. This issue was not noticed until mid-afternoon on Monday when a technical member entered the station to see the Studio Clock reporting that the server was out of space (it turns out the studio clock does that now).
+
* When the scheduled backup processes ran in the early hours of Monday morning, the backup system failed to detect that the network mountpoints were not set up on uryfs1. The system started to back up to its own root filesystem until all space was consumed, at which point services started to fail overnight. Luckily, Icecast was not an affected service, and successfully fell back to a static jukebox feed when it detected that Liquidsoap had stopped responding. This issue was not noticed until mid-afternoon on Monday when a technical member entered the station to see the Studio Clock reporting that the server was out of space (it turns out the studio clock does that now).
The ZFS pools on urybackup0 were not automatically mounted on boot. This was also not noticed until attempts to resolve the uryfs1 backup issue were made. Our primary archive filestore was therefore unavailable to members for most of the day.
+
* The ZFS pools on urybackup0 were not automatically mounted on boot. This was also not noticed until attempts to resolve the uryfs1 backup issue were made. Our primary archive filestore was therefore unavailable to members for most of the day.
    
== Changes Made ==
 
== Changes Made ==

Navigation menu