Difference between revisions of "Incident Report: 20181114"
(Created page with "{{Incident |brief=ITS broke |severity=Critical |impact=High (Dead air for around 2 minutes) |start=25/02/2017 16:29 |end=25/02/2017 17:30 |mitigation=ITS Broke |...") |
(→Causes) |
||
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{Incident | {{Incident | ||
|brief=ITS broke | |brief=ITS broke | ||
− | |severity= | + | |severity=Moderate |
|impact=High (Dead air for around 2 minutes) | |impact=High (Dead air for around 2 minutes) | ||
|start=25/02/2017 16:29 | |start=25/02/2017 16:29 | ||
|end=25/02/2017 17:30 | |end=25/02/2017 17:30 | ||
− | |mitigation= | + | |mitigation=Fix Nameservers |
|leader=[[Isaac Lowe]] | |leader=[[Isaac Lowe]] | ||
|others=[[Jordan Cameron]] | |others=[[Jordan Cameron]] | ||
}} | }} | ||
− | + | (Total dead air: 16:28:44 - 16:30:36) | |
− | + | This page is under development because this literally just happened. | |
+ | == Causes == | ||
+ | Basically, ITS broke so anything at URY that's reliant on their DNS/Nameservers also broke, so we lost Jukebox, MyRadio and even the studio selector. However, it couldn't have happened at a better time with Head of Computing Jordan Cameron and Assistant Head of Computing Isaac Lowe both in the station at the same time. | ||
− | + | Interestingly, Rednet (Dante) and BAPS continued to function properly, so Jordan and Tom Burrows (who just happened to be in the station) did an unplanned Chat and Such, and were later joined by Jacob Dicker. | |
− | + | Approximately 30 seconds before the resumption of full services (ITS turned themselves back off and on again) Isaac was able to manually re-start Jukebox and the crisis was over. | |
− | |||
== Work Required == | == Work Required == | ||
− | + | Investigate reducing dependency on ITS systems (NTP/DNS). | |
+ | |||
+ | Honestly, we got lucky. If this had happened at 4am the dead-air time might have been measured in hours rather than minutes. | ||
+ | |||
+ | Also, don't walk into Studio Red and proclaim everything is broken. | ||
[[Category:Incident Reports]] | [[Category:Incident Reports]] |
Latest revision as of 23:41, 14 November 2018
Incident Report | |
---|---|
ITS broke | |
Summary | |
Severity | Moderate |
Impact | High (Dead air for around 2 minutes) |
Event Start | 25/02/2017 16:29 |
Event End | 25/02/2017 17:30 |
Recurrence Mitigation | Fix Nameservers |
Contacts | |
Recovery Leader | Isaac Lowe |
Other Attendees | Jordan Cameron |
(Total dead air: 16:28:44 - 16:30:36)
This page is under development because this literally just happened.
Causes
Basically, ITS broke so anything at URY that's reliant on their DNS/Nameservers also broke, so we lost Jukebox, MyRadio and even the studio selector. However, it couldn't have happened at a better time with Head of Computing Jordan Cameron and Assistant Head of Computing Isaac Lowe both in the station at the same time.
Interestingly, Rednet (Dante) and BAPS continued to function properly, so Jordan and Tom Burrows (who just happened to be in the station) did an unplanned Chat and Such, and were later joined by Jacob Dicker.
Approximately 30 seconds before the resumption of full services (ITS turned themselves back off and on again) Isaac was able to manually re-start Jukebox and the crisis was over.
Work Required
Investigate reducing dependency on ITS systems (NTP/DNS).
Honestly, we got lucky. If this had happened at 4am the dead-air time might have been measured in hours rather than minutes.
Also, don't walk into Studio Red and proclaim everything is broken.