<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://ury.org.uk/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=11103</id>
	<title>URY Wiki - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://ury.org.uk/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=11103"/>
	<link rel="alternate" type="text/html" href="https://ury.org.uk/wiki/Special:Contributions/11103"/>
	<updated>2026-05-20T17:13:09Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.44.2</generator>
	<entry>
		<id>https://ury.org.uk/mediawiki/index.php?title=Incident_Report:_20200511&amp;diff=1115</id>
		<title>Incident Report: 20200511</title>
		<link rel="alternate" type="text/html" href="https://ury.org.uk/mediawiki/index.php?title=Incident_Report:_20200511&amp;diff=1115"/>
		<updated>2020-05-11T11:36:33Z</updated>

		<summary type="html">&lt;p&gt;11103: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Incident&lt;br /&gt;
  |brief=Switch reboot took down DNS, which brooke&#039;d selector&lt;br /&gt;
  |severity=Moderate&lt;br /&gt;
  |impact=High (Dead air for around 45 minutes)&lt;br /&gt;
  |start=2020-05-11 05:00&lt;br /&gt;
  |end=2020-05-11 05:46&lt;br /&gt;
  |mitigation=Reduce dependency on uplink&lt;br /&gt;
  |leader=Connor Sanders (CS)&lt;br /&gt;
  |others=Isaac Lowe (IL)&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
At 05:00:00, our IT Services uplink switch, urysw4, rebooted for a regularly scheduled update (that nobody was aware of because we weren&#039;t on the ITS comms list).&lt;br /&gt;
&lt;br /&gt;
AutoSwitcher had started dutifully doing the news at 04:59:45, and was preparing to finish doing the news at 05:02:00. It tried to switch back from WebStudio (the news is layered over a silent WebStudio source... don&#039;t ask) to Jukebox, but it found that, since our uplink was down, it couldn&#039;t reach the campus DNS servers, thus couldn&#039;t resolve &#039;&#039;selector.york.ac.uk&#039;&#039;, and thus couldn&#039;t switch back.&lt;br /&gt;
&lt;br /&gt;
That left us with an empty WebStudio source broadcasting. Liquidsoap detected the silence (and sent a rather beautiful &amp;quot;Source 0 was on air&amp;quot; silence email), but couldn&#039;t switch back to Jukebox for the same reason. The switch finished rebooting at 05:04:30, but we were stuck on dead air. Dearie-Me, for some inexplicable reason, didn&#039;t fire until 5:10 (presumably static was keeping it from hitting the threshold), and CS woke up and saw the alerts at 05:37, before switching back to Jukebox at 05:46 and investigating with the assistance of IL.&lt;br /&gt;
&lt;br /&gt;
== Reoccurrence mitigation ==&lt;br /&gt;
&lt;br /&gt;
* Reduce dependency on upstream services&lt;br /&gt;
:* Investigate a local caching DNS resolver?&lt;br /&gt;
* Ask ITS kindly to tell us when they take down our campus uplink&lt;br /&gt;
* Ask ITS kindly to make it reboot at xx:30 instead of xx:00&lt;br /&gt;
* Improve documentation and logging of the new WebStudio services, to make future troubleshooting easier&lt;br /&gt;
* Figure out why Dearie-Me didn&#039;t fire - possibly needs a recalibrate&lt;br /&gt;
* Reduce log spamminess of Dearie-Me, it filled up its journald buffer quite quickly&lt;br /&gt;
&lt;br /&gt;
== Timings ==&lt;br /&gt;
&lt;br /&gt;
                   HH:MM:SS&lt;br /&gt;
  Dead air start:  05:02:06.500&lt;br /&gt;
  Dead air end:    05:45:42.000&lt;br /&gt;
  TOTAL:           00:43:35.500&lt;br /&gt;
&lt;br /&gt;
[[Category:Incident Reports]]&lt;/div&gt;</summary>
		<author><name>11103</name></author>
	</entry>
	<entry>
		<id>https://ury.org.uk/mediawiki/index.php?title=Incident_Report:_20200511&amp;diff=1114</id>
		<title>Incident Report: 20200511</title>
		<link rel="alternate" type="text/html" href="https://ury.org.uk/mediawiki/index.php?title=Incident_Report:_20200511&amp;diff=1114"/>
		<updated>2020-05-11T11:35:22Z</updated>

		<summary type="html">&lt;p&gt;11103: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Incident&lt;br /&gt;
  |brief=Switch reboot took down DNS, which brooke&#039;d selector&lt;br /&gt;
  |severity=Moderate&lt;br /&gt;
  |impact=High (Dead air for around 45 minutes)&lt;br /&gt;
  |start=2020-05-11 05:00&lt;br /&gt;
  |end=202-05-11 05:46&lt;br /&gt;
  |mitigation=Reduce dependency on uplink&lt;br /&gt;
  |leader=Connor Sanders (CS)&lt;br /&gt;
  |others=Isaac Lowe (IL)&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
== Summary ==&lt;br /&gt;
&lt;br /&gt;
At 05:00:00, our IT Services uplink switch, urysw4, rebooted for a regularly scheduled update (that nobody was aware of because we weren&#039;t on the ITS comms list).&lt;br /&gt;
&lt;br /&gt;
AutoSwitcher had started dutifully doing the news at 04:59:45, and was preparing to finish doing the news at 05:02:00. It tried to switch back from WebStudio (the news is layered over a silent WebStudio source... don&#039;t ask) to Jukebox, but it found that, since our uplink was down, it couldn&#039;t reach the campus DNS servers, thus couldn&#039;t resolve &#039;&#039;selector.york.ac.uk&#039;&#039;, and thus couldn&#039;t switch back.&lt;br /&gt;
&lt;br /&gt;
That left us with an empty WebStudio source broadcasting. Liquidsoap detected the silence (and sent a rather beautiful &amp;quot;Source 0 was on air&amp;quot; silence email), but couldn&#039;t switch back to Jukebox for the same reason. The switch finished rebooting at 05:04:30, but we were stuck on dead air. Dearie-Me, for some inexplicable reason, didn&#039;t fire until 5:10 (presumably static was keeping it from hitting the threshold), and CS woke up and saw the alerts at 05:37, before switching back to Jukebox at 05:46 and investigating with the assistance of IL.&lt;br /&gt;
&lt;br /&gt;
== Reoccurrence mitigation ==&lt;br /&gt;
&lt;br /&gt;
* Reduce dependency on upstream services&lt;br /&gt;
:* Investigate a local caching DNS resolver?&lt;br /&gt;
* Ask ITS kindly to tell us when they take down our campus uplink&lt;br /&gt;
* Ask ITS kindly to make it reboot at xx:30 instead of xx:00&lt;br /&gt;
* Improve documentation and logging of the new WebStudio services, to make future troubleshooting easier&lt;br /&gt;
* Figure out why Dearie-Me didn&#039;t fire - possibly needs a recalibrate&lt;br /&gt;
* Reduce log spamminess of Dearie-Me, it filled up its journald buffer quite quickly&lt;br /&gt;
&lt;br /&gt;
== Timings ==&lt;br /&gt;
&lt;br /&gt;
                   HH:MM:SS&lt;br /&gt;
  Dead air start:  05:02:06.500&lt;br /&gt;
  Dead air end:    05:45:42.000&lt;br /&gt;
  TOTAL:           00:43:35.500&lt;br /&gt;
&lt;br /&gt;
[[Category:Incident Reports]]&lt;/div&gt;</summary>
		<author><name>11103</name></author>
	</entry>
</feed>