Incident Report: 20181114

From URY Wiki
Jump to navigation Jump to search
Incident Report
ITS broke
Summary
Severity Moderate
Impact High (Dead air for around 2 minutes)
Event Start 25/02/2017 16:29
Event End 25/02/2017 17:30
Recurrence Mitigation Fix Nameservers
Contacts
Recovery Leader Isaac Lowe
Other Attendees Jordan Cameron


(Total dead air: 16:28:44 - 16:30:36)

This page is under development because this literally just happened.

Causes

Basically, ITS broke so anything at URY that's reliant on their DNS/Nameservers also broke, so we lost Jukebox, MyRadio and even the studio selector. However, it couldn't have happened at a better time with Head of Computing Jordan Cameron and Assistant Head of Computing Isaac Lowe both in the station at the same time.

Interestingly, Rednet (Dante) and BAPS continued to function properly, so Jordan and Tom Burrows (who just happened to be in the station) did an unplanned Chat and Such, and were later joined by Jacob Dicker.

Approximately 30 seconds before the resumption of full services (ITS turned themselves back off and on again) Isaac was able to manually re-start Jutebox and the crisis was over.

Work Required

Investigate reducing dependency on ITS systems (NTP/DNS).

Honestly, we got lucky. If this had happened at 4am the dead-air time might have been measured in hours rather than minutes.

Also, don't walk into Studio Read and proclaim everything is broken.