Changes

Jump to navigation Jump to search
m
Line 1: Line 1: −
'''PLEASE EVACUATE THE BUILDING SAFELY AND UPDATE SOCIAL MEDIA BEFORE ATTEMPTING TO RESTORE SERVICE'''
  −
   
''This is a ops-critical document. A printed copy is available in the Server Cupboard and should be updated whenever this online version is.''
 
''This is a ops-critical document. A printed copy is available in the Server Cupboard and should be updated whenever this online version is.''
   −
So, you've done a full shutdown. Or, there was a power cut or zombie apocalypse that interrupted the ability of our physical servers to operate. The good news is that you now thing you're ready to turn things back on.
+
Turn servers off in this order, waiting a few seconds between each button:
   −
''Remember - during any power failure it is advised to immediately switch off the transmitter. See [[Shutting Down URY in a Hurry]].''
+
* urystv [no need to wait]
 
+
* ury (thunderhorn)
== Before You Start - Is It Safe Checklist ==
+
* dolby
* Is power back on yet? Has it been stable for a few minutes?
+
* urybsod
* Has the Head of Computing or Station Manager given consent to restoring service?
+
* urysteve
* Does information from Estates, YUSU or other relevant sources suggest all is okay?
+
* urybackup0
* If you are going to re-start AM Transmission, have you got consent from the Chief Engineer to power on the transmitter audio path?
+
* uryfw0
* Do you have at least two technical team members on site (ideally one engineer)?
+
* transmitter, uryblue, uryred [call engineering now]
 
+
* uryrrod [VMWare]
Great - lets give this a go.
  −
 
  −
== Network Infrastructure ==
  −
We got all these servers right? Well they ain't no good until there's a network. You do this stage in [[The Hub]].
  −
 
  −
* urysw4 should come up on its own, as it has PoE [???] - check the injector is on
  −
* Power on urysw3 (The HP ProCurve 2626 [The top one])
  −
* Power on urysw1 (The Netgear GS748T [The bottom one])
  −
 
  −
Have both of these switched on? Is urysw3 blinking happily? Is urysw1 looking like nothing much is happening? Perfect.
  −
 
  −
== Stores Power ==
  −
Our power supply gets a little upset very easily. If this outage was caused by a power cut, chances are you'll want to use this section to restore power to the [[Server Cupboard]] circuit.
  −
 
  −
* Ensure the Transmitter is switched OFF (if it wasn't already, you aren't very good at reading this guide)
  −
* Ensure the AM compressor is switched OFF
  −
* Ensure the output compressor is switched OFF
  −
* Ensure the AM Receiver is switched OFF
  −
 
  −
If you don't do this, then the initial inrush of power from turning on a rack full of equipment will overload the B16 breaker. It will make a noise as things try to turn on, give you a little fright, then promptly trip again. Possibly with a bright flash of light for dramatic effect.
  −
 
  −
* [Two People Required] Turn on the breaker labelled "metal clad sockets"
  −
* If it did not do so automatically, switch on the UPS
  −
* Turn the AM '''Receiver''' back on and ensure it is still tuned to 1350AM
  −
* Turn the lower output compressor back ON (third box from the top)
  −
 
  −
The UPS will now begin charging. If it is fully depleted, it will be around 5 minutes before it will enable output power to the servers. Depending on the BIOS configuration, some may then start to automatically boot. Avoid this, if at all possible.
  −
 
  −
== Critical Servers ==
  −
We identify critical servers as those that enable us to broadcast on AM. URY Policy states that we must have '''two''' operating loggers before restoration of AM service. You'll also want the jukebox to play some noise.
  −
 
  −
* Power on [[uryfw0]]
  −
* No really, turn on uryfw0. Are you sure it’s on yet? Since this is the gateway for all URY systems, other servers may have trouble bringing up interfaces if it is not up.
  −
* Power on [[uryred]]
  −
* Power on [[uryblue]]
  −
* On both the loggers, run <code>sudo service loggerng status</code> and start it if it fails to auto-start
  −
* Power on [[dolby]]
  −
* If you don't get audio, run <code>cd /usr/local/etc/liquidsoap/scripts && sudo ./startAudio.sh</code>
  −
 
  −
The station should start outputting the world's most annoying loop, featuring a happy instrumental tune and someone telling you that we're off air right now. We're most definitely not.
  −
:* If it tries to play a jingle, this might fail spectacularly and go to a loop of Monty Python's Intermission, featuring Alex Boyall giving a grammatically incorrect technical difficulties message.
  −
 
  −
== AM Broadcast ==
  −
You can't start this section until at least 5-10 minutes after following ''Critical Servers''. It might be worth skipping to ''Core Computing Services'' and coming back in a bit. It also requires permission from the Chief Engineer.
  −
 
  −
* '''Two persons''' must '''separately check and verify''' that '''at least two''' logger services are operating correctly and have recorded the last 5 minutes of station output (or in the case of an AM logger, several minutes of static).
  −
* Switch on the AM compressor (second box from the top)
  −
* Switch on the AM Transmitter
  −
 
  −
If any of the following tests fail, '''switch off the transmitter immediately''' and follow [[Transmitter Troubleshooting]]
  −
* Are all four power indicators on the left lit?
  −
* Is the fan on the rear of the unit spinning?
  −
* Is the forward power meter registering approximately 20W?
  −
* Is the reflected power meter registering a negligible level (2-3W is okay, slightly more if it's damp outside)?
  −
* Is the AM receiver showing 3 signal bars on its display?
  −
 
  −
* After 5 minutes, two people should then check the AM loggers.
  −
 
  −
== Core Computing Services ==
  −
Core Computing Services are defined as those which must be operational for URY to broadcast anything other than [[iTones]] (or, at this point, Intermission).
  −
 
  −
* Power on [[urybackup0]] and [[urysteve]]
  −
* ''Wait'' for urysteve to finish booting
  −
* Power on [[ury]]
  −
* Ensure selector is powered in The Hub
  −
* Fade up jukebox in [[Studio Red]], then switch to S1 then back to S3
  −
** This ensures selector state is up to date
  −
** Jukebox should now be playing actual music. You might need to restart it if it's stuck on techlude - on Dolby, <code>cd /usr/local/etc/liquidsoap/scripts && sudo systemctl stop ury-jack && sudo ./startAudio.sh && sel 8 && sel 3</code>
  −
 
  −
That's the absolute basics. Now verify the following are accessible and functioning:
     −
* http://ury.org.uk/
+
''Note: The KVM is powered by a 12V brick, so can’t go on UPS power. So you need to move the monitor cable between each server if one seems to be having trouble going down. The keyboard should still pass through using power gleaned from the PS/2 ports, if you want to risk that.'' [not sure if this is still a thing?]
* https://ury.org.uk/myradio/
  −
* https://ury.org.uk/roundcube/ (including sending test emails both internally and externally)
  −
** It is possible that mta.york.ac.uk is not yet back online, or that it is but refuses to route mail. You can test this with good ol' <code>telnet mta.york.ac.uk 25</code>
  −
* http://ury.org.uk/live/
  −
* live-high, live-mobile, live-high-ogg, jukebox streams are visible at https://audio.ury.org.uk/status
  −
* BAPS (all the presenter and guest PCs)
  −
* Timelord (might need a reboot on eccleston/tennant/smith)
  −
* myradio_daemon
     −
We're now at a point where shows can go on and things will mostly be okay.
+
A big factor in delays on powering down is hanging waiting on NFS/SMB - problematic if you’ve shut down whatever was providing the mount, so stick to this order.
   −
==Dante==
+
As early as possible during this process, try to reach one of: Station Manager; Assistant Station Manager; Programme Controller to inform them of the service outage so they can invoke necessary social media routes.
   −
* Open Dante Controller on one of the studio or production PCs (you might need Wogan up for this), and ensure that everything is happy and you see lots of green check marks.
+
Remember: Once these servers are off, sending emails to @ury.org.uk email accounts doesn't work! Use Slack, @york.ac.uk addresses, Facebook or phone numbers.
 
  −
== Additional Servers ==
  −
Power on other systems:
  −
* urybsod
  −
** Don't forget to remount <code>urybackup0:/pool0/backup</code> to <code>/mnt/pool0</code>
  −
** Use https://urybsod.york.ac.uk/xymon/ to monitor other services
  −
** https://ury.org.uk/loggerng/ should now be available
  −
* uryrrod
  −
** Only provides mixclouder service and webcams, will have no immediate noticable impact
  −
** Note that in the event of a full power outage, the ITS Cloud may not be immediately available. Patience.
  −
* wogan
  −
** The Windows PCs may be a bit unhappy if it isn't around
  −
** See above about the ITS Cloud.
  −
* urystv
  −
* moyles - you'll need ITS to do this
     −
== Nearly Done ==
+
You now won't have much to do until the power's back on, most likely. Using a manual writing implement, make note of how the procedure went in preparation for [[Cold-Starting URY Systems]] later on.
Everything should be bright and cheery again now. You should now complete a full incident report and make it available online in [[:Category:Incident Reports]].
     −
Ideally, you'd also act on any recommendations this review brings up to make things run better in future.
+
== Rationale ==
 +
* urystv has no critical mounts, so it can be a quick way to shed some load
 +
* ury goes after that since it doesn’t have any mounts elsewhere
 +
* dolby after that, because of postgres
 +
* urybsod has some exports pertaining to log generation, namely to uryrrod and ury
 +
* urysteve now, because of /music
 +
* urybackup0 now because urysteve backs up to it. [If the UPS is absolutely screaming about low battery, you can risk taking this down first as it does draw the most power - still accurate?]
 +
* uryfw0 after all that -- would be handy to still have comms if servers need to cross networks (unmounting loggers)
 +
* The loggers, in no particular order.
 +
* The transmitter must be turned off if the loggers are powered down, and especially if the UPS power fails altogether, due to lack of logging capability, a legal requirement. Call engineering to let them know this has happened.
 +
* uryrrod mounts urybsod for mixclouder and urybackup0 for webcams, so it may be unhappy if it's unmounted - this is not critical though
    
[[Category:Technical How-Tos]]
 
[[Category:Technical How-Tos]]

Navigation menu