Difference between revisions of "Incident Report: 20181114"

From URY Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Incident
 
{{Incident
 
   |brief=ITS broke
 
   |brief=ITS broke
   |severity=Critical
+
   |severity=Moderate
 
   |impact=High (Dead air for around 2 minutes)
 
   |impact=High (Dead air for around 2 minutes)
 
   |start=25/02/2017 16:29
 
   |start=25/02/2017 16:29
Line 10: Line 10:
 
}}
 
}}
  
ITS had a major system error. This broke our stuff. More to follow. 
+
(Total dead air: 16:28:44 - 16:30:36)
  
(Total dead air: 16:28:44 - 16:30:36)
+
This page is under development because this literally just happened.
 +
== Causes ==
 +
Basically, ITS broke so anything at URY that's reliant on their DNS/Nameservers also broke, so we lost Jukebox, MyRadio and even the studio selector. However, it couldn't have happened at a better time with Head of Computing Jordan Cameron and Assistant Head of Computing Isaac Lowe both in the station at the same time.
  
At 19:22, output was successfully switched to OB (they were running late anyway), which then proceeded without incident.
+
Interestingly, Rednet (Dante) and BAPS continued to function properly, so Jordan and Tom Burrows (who just happened to be in the station) did an unplanned Chat and Such, and were later joined by Jacob Dicker.  
  
== Causes ==
+
Approximately 30 seconds before the resumption of full services (ITS turned themselves back off and on again) Isaac was able to manually re-start Jukebox and the crisis was over.
Namesevers.  
 
  
 
== Work Required ==
 
== Work Required ==
Namesevers.  
+
Investigate reducing dependency on ITS systems (NTP/DNS).
 +
 
 +
Honestly, we got lucky. If this had happened at 4am the dead-air time might have been measured in hours rather than minutes.
 +
 
 +
Also, don't walk into Studio Red and proclaim everything is broken.  
  
 
[[Category:Incident Reports]]
 
[[Category:Incident Reports]]

Latest revision as of 00:41, 15 November 2018

Incident Report
ITS broke
Summary
Severity Moderate
Impact High (Dead air for around 2 minutes)
Event Start 25/02/2017 16:29
Event End 25/02/2017 17:30
Recurrence Mitigation Fix Nameservers
Contacts
Recovery Leader Isaac Lowe
Other Attendees Jordan Cameron


(Total dead air: 16:28:44 - 16:30:36)

This page is under development because this literally just happened.

Causes

Basically, ITS broke so anything at URY that's reliant on their DNS/Nameservers also broke, so we lost Jukebox, MyRadio and even the studio selector. However, it couldn't have happened at a better time with Head of Computing Jordan Cameron and Assistant Head of Computing Isaac Lowe both in the station at the same time.

Interestingly, Rednet (Dante) and BAPS continued to function properly, so Jordan and Tom Burrows (who just happened to be in the station) did an unplanned Chat and Such, and were later joined by Jacob Dicker.

Approximately 30 seconds before the resumption of full services (ITS turned themselves back off and on again) Isaac was able to manually re-start Jukebox and the crisis was over.

Work Required

Investigate reducing dependency on ITS systems (NTP/DNS).

Honestly, we got lucky. If this had happened at 4am the dead-air time might have been measured in hours rather than minutes.

Also, don't walk into Studio Red and proclaim everything is broken.