Difference between revisions of "Cold-Starting URY Systems"

From URY Wiki
Jump to navigation Jump to search
(initial content)
(No difference)

Revision as of 11:12, 7 July 2020

PLEASE EVACUATE THE BUILDING SAFELY AND UPDATE SOCIAL MEDIA BEFORE ATTEMPTING TO RESTORE SERVICE

This is a ops-critical document. A printed copy is available in the Server Cupboard and should be updated whenever this online version is.

So, you've done a full shutdown. Or, there was a power cut or zombie apocalypse that interrupted the ability of our physical servers to operate. The good news is that you now thing you're ready to turn things back on.

Remember - during any power failure it is advised to immediately switch off the transmitter. See Shutting Down URY in a Hurry.

Before You Start - Is It Safe Checklist

  • Is power back on yet? Has it been stable for a few minutes?
  • Has the Head of Computing or Station Manager given consent to restoring service?
  • Does information from Estates, YUSU or other relevant sources suggest all is okay?
  • [Preferred] If you are going to re-start AM Transmission, have you got consent from the Chief Engineer to power on the transmitter audio path?
  • [Preferred] Do you have at least two technical team members on site (ideally one engineer)?

Great - lets give this a go.

Network Infrastructure

We got all these servers right? Well they ain't no good until there's a network. You do this stage in The Hub.

  • Power on urysw3 (The HP ProCurve 2626 [The top one])
  • Power on urysw1 (The Netgear GS748T [The bottom one])

Have both of these switched on? Is urysw3 blinking happily? Is urysw1 looking like nothing much is happening? Perfect.

Stores Power

Our power supply gets a little upset very easily. If this outage was caused by a power cut, chances are you'll want to use this section to restore power to the Server Cupboard circuit.

  • Ensure the Transmitter is switched OFF (if it wasn't already, you aren't very good at reading this guide)
  • Ensure the AM compressor is switched OFF
  • Ensure the output compressor is switched OFF
  • Ensure the AM Receiver is switched OFF

If you don't do this, then the initial inrush of power from turning on a rack full of equipment will overload the B16 breaker. It will make a noise as things try to turn on, give you a little fright, then promptly trip again. Possibly with a bright flash of light for dramatic effect.

  • [Two People Required] Turn on the breaker labelled "metal clad sockets"
  • If it did not do so automatically, switch on the UPS
  • Turn the AM Receiver back on and ensure it is still tuned to 1350AM
  • Turn the lower output compressor back ON (third box from the top)

The UPS will now begin charging. If it is fully depleted, it will be around 5 minutes before it will enable output power to the servers. Depending on the BIOS configuration, some may then start to automatically boot. Avoid this, if at all possible.

Critical Servers

We identify critical servers as those that enable us to broadcast on AM. URY Policy states that we must have two operating loggers before restoration of AM service. You'll also want the jukebox to play some noise.

  • Power on jukebox in The Hub
  • Power on logger1
  • Power on logger2
  • Power on uryred
  • Power on uryblue
  • On logger1, run `/usr/local/etc/rc.d/audiolog.sh start`
  • On logger2, run `/etc/rc3.d/S40audiolog start`. You may also need to `modprobe es1370`
  • Power on uryfw0

If at any stage you are unhappy with the noises, smells, or LED blink patterns on logger1 or logger2, stop, turn that machine back off, and come back to it. If both are not running, you will need to investigate this before you continue - uryred is not currently considered a stable logger so is not a valid substitute. Generally, they'll work when you try again a little later (they don't like being cold).

Jukebox should boot up happily, and the station will start outputting a loop of Monty Python's Intermission, occasionally overlaid with a grammatically incorrect technical difficulties message from Alex Boyall.

AM Broadcast

You can't start this section until at least 5-10 minutes after following Critical Servers. It might be worth skipping to Core Computing Services and coming back in a bit. It also requires permission from the Chief Engineer.

  • Two persons must separately check and verify that at least two logger services are operating correctly and have recorded the last 5 minutes of station output (or in the case of an AM logger, several minutes of static).
  • Switch on the AM compressor (second box from the top)
  • Switch on the AM Transmitter

If any of the following tests fail, switch off the transmitter immediately and follow Transmitter Troubleshooting

  • Are all four power indicators on the left lit?
  • Is the fan on the rear of the unit spinning?
  • Is the forward power meter registering approximately 20W?
  • Is the reflected power meter registering a negligible level (2-3W is okay, slightly more if it's damp outside)?
  • Is the AM receiver showing 3 signal bars on its display?
  • After 5 minutes, two people should then check the AM loggers.

Core Computing Services

Core Computing Services are defined as those which must be operational for URY to broadcast anything other than iTones (or, at this point, Intermission).

  • Power on uryfs1 and themis
  • Wait for uryfs1 to finish booting as most other systems are dependent on it
  • Run `mount -t ext4 /dev/sdb1 /music` on uryfs1
  • Power on ury
  • Fade up jukebox in Studio 1, then switch to S1 then back to S3
    • This ensures selector state is up to date

That's the absolute basics. Themis is also pretty much optional, but as it provides some authentication and DNS services, it's best we bring it up too. Now verify the following are accessible and functioning:

We're now at a point where shows can go on and things will mostly be okay.

Additional Servers

Power on other systems:

  • urybsod
  • copperbox
  • urybackup0
    • Run `zfs mount -a`, `service nfsd restart`, `service mountd restart`
    • Windows and backup filestores should now be available. Mount the backup filestores on ury and uryfs1.
  • uryrrod
    • Only provides mixclouder service, will have no immediate noticable impact

Nearly Done

Everything should be bright and cheery again now. You should now complete a full incident report, making it available online in Category:Incident Reports and sending it to computing@ury.org.uk, engineering@ury.org.uk and management@ury.org.uk.

Ideally, you'd also act on any recommendations this review brings up to make things run better in future.