Incident Report: 20181114
Incident Report | |
---|---|
ITS broke | |
Summary | |
Severity | Moderate |
Impact | High (Dead air for around 2 minutes) |
Event Start | 25/02/2017 16:29 |
Event End | 25/02/2017 17:30 |
Recurrence Mitigation | Fix Nameservers |
Contacts | |
Recovery Leader | Isaac Lowe |
Other Attendees | Jordan Cameron |
(Total dead air: 16:28:44 - 16:30:36)
This page is under development because this literally just happened.
Causes
Basically, ITS broke so anything at URY that's reliant on their DNS/Nameservers also broke, so we lost Jukebox, MyRadio and even the studio selector. However, it couldn't have happened at a better time with Head of Computing Jordan Cameron and Assistant Head of Computing Isaac Lowe both in the station at the same time.
Interestingly, Rednet (Dante) and BAPS continued to function properly, so Jordan and Tom Burrows (who just happened to be in the station) did an unplanned Chat and Such, and were later joined by Jacob Dicker.
Approximately 30 seconds before the resumption of full services (ITS turned themselves back off and on again) Isaac was able to manually re-start Jukebox and the crisis was over.
Work Required
Investigate reducing dependency on ITS systems (NTP/DNS).
Honestly, we got lucky. If this had happened at 4am the dead-air time might have been measured in hours rather than minutes.
Also, don't walk into Studio Red and proclaim everything is broken.