Changes

no edit summary
Line 1: Line 1:  
{{Incident
 
{{Incident
   |brief=Switch reboot took down DNS
+
   |brief=Switch reboot took down DNS, which brooke'd selector
 
   |severity=Moderate
 
   |severity=Moderate
   |impact=High (Dead air for around 2 minutes)
+
   |impact=High (Dead air for around 45 minutes)
   |start=25/02/2017 16:29
+
   |start=2020-05-11 05:00
   |end=25/02/2017 17:30
+
   |end=2020-05-11 05:46
 
   |mitigation=Reduce dependency on uplink
 
   |mitigation=Reduce dependency on uplink
 
   |leader=Connor Sanders (CS)
 
   |leader=Connor Sanders (CS)
   |others=Isaac Lowe (IL)
+
   |others=Isaac Lowe (IL), Marks Polakovs (MP)
 
}}
 
}}
  −
(Total dead air: 05:02:02-05:46)
      
== Summary ==
 
== Summary ==
Line 24: Line 22:  
* Reduce dependency on upstream services
 
* Reduce dependency on upstream services
 
:* Investigate a local caching DNS resolver?
 
:* Investigate a local caching DNS resolver?
* Ask ITS kindly to tell us when they take down our campus uplink
+
:* '''MP - done-ish, running unbound on uryfw0 and many (but not all boxes use it)'''
* Ask ITS kindly to make it reboot at xx:30 instead of xx:00
+
* Ask ITS nicely to tell us when they take down our campus uplink
 +
:* '''MP - done'''
 +
* Ask ITS nicely to make it reboot at xx:30 instead of xx:00
 +
:* '''MP - done'''
 
* Improve documentation and logging of the new WebStudio services, to make future troubleshooting easier
 
* Improve documentation and logging of the new WebStudio services, to make future troubleshooting easier
 +
* Figure out why Dearie-Me didn't fire - possibly needs a recalibrate
 
* Reduce log spamminess of Dearie-Me, it filled up its journald buffer quite quickly
 
* Reduce log spamminess of Dearie-Me, it filled up its journald buffer quite quickly
 +
 +
== Timings ==
 +
 +
                  HH:MM:SS
 +
  Dead air start:  05:02:06.500
 +
  Dead air end:    05:45:42.000
 +
  TOTAL:          00:43:35.500
    
[[Category:Incident Reports]]
 
[[Category:Incident Reports]]