Incident Report: 20131201: Difference between revisions
No edit summary |
No edit summary |
||
| Line 53: | Line 53: | ||
== Changes Made == | == Changes Made == | ||
The monitor in the server cupboard has been moved from mains power to UPS power - this is needed to power off the loggers gracefully as they have AT power supplies. The next cable tidyup in that rack should remove the excess mains cable. | * The monitor in the server cupboard has been moved from mains power to UPS power - this is needed to power off the loggers gracefully as they have AT power supplies. The next cable tidyup in that rack should remove the excess mains cable. | ||
uryfs1 will now detect missing backup mounts and attempt to restore them. | * uryfs1 will now detect missing backup mounts and attempt to restore them. | ||
Remove the need to enter console passwords on boot for ury and uryfs1 | * Remove the need to enter console passwords on boot for ury and uryfs1 | ||
Decrypt the SSL certificates and review permissions on their storage directory | ** Decrypt the SSL certificates and review permissions on their storage directory | ||
== Changes to be Made == | == Changes to be Made == | ||
* Investigate Stores Distribution Board upgrade | * Investigate Stores Distribution Board upgrade | ||
Re-raise the issue with Estates and YUSU | ** Re-raise the issue with Estates and YUSU | ||
Prevent uryfs1 and other servers from backing up if the mount fails to be brought up | * Prevent uryfs1 and other servers from backing up if the mount fails to be brought up | ||
Anyone want to practice their bash-fu? | ** Anyone want to practice their bash-fu? | ||
Ensure urybackup0 mounts /pool0 and /pool1 on boot | * Ensure urybackup0 mounts /pool0 and /pool1 on boot | ||
Currently need to run zfs mount pool0 && zfs mount pool1 | ** Currently need to run zfs mount pool0 && zfs mount pool1 | ||
Ensure uryfs1 mounts /music on boot | * Ensure uryfs1 mounts /music on boot | ||
Server seems to get its filesystem types confused | ** Server seems to get its filesystem types confused | ||
Remember to power uryfw0 up first | * Remember to power uryfw0 up first | ||
Much spinning waiting on NTP etc otherwise (partly not our fault) | ** Much spinning waiting on NTP etc otherwise (partly not our fault) | ||
Add checking backup mounts, uryrrod to our standard boot procedure, plus generally update the document | * Add checking backup mounts, uryrrod to our standard boot procedure, plus generally update the document | ||
https://docs.google.com/document/d/12gdrkNWPqC0hc0sJ1ETqM9TXCwmQy3ZBma1anc8ZO7M/edit | ** https://docs.google.com/document/d/12gdrkNWPqC0hc0sJ1ETqM9TXCwmQy3ZBma1anc8ZO7M/edit | ||
Review policy for communication on social media during a failure | * Review policy for communication on social media during a failure | ||
Should named Technical persons have access for these scenarios? | ** Should named Technical persons have access for these scenarios? | ||
Review policy for calling named persons at 2am (this kind of thing normally happens during the day) | * Review policy for calling named persons at 2am (this kind of thing normally happens during the day) | ||
Would Al have rather been woken up than not know? | ** Would Al have rather been woken up than not know? | ||
Improve documentation on Outside Broadcast systems and standard contingency plans if 802.1x is unavailable on the campus network | * Improve documentation on Outside Broadcast systems and standard contingency plans if 802.1x is unavailable on the campus network | ||
Known non-NAS/opfa ports, IP addresses available to us | ** Known non-NAS/opfa ports, IP addresses available to us | ||
Give engineering the learnings of ‘how to tell what a port is’? | ** Give engineering the learnings of ‘how to tell what a port is’? | ||
We should generally have a how-to guide for the freshers | ** We should generally have a how-to guide for the freshers | ||
Review whether the stores distribution board should be touched (I mean, look at it) | * Review whether the stores distribution board should be touched (I mean, look at it) | ||
Talk to Estates/Health and Safety? Last time I did I got a “What the f***” from the electrician | ** Talk to Estates/Health and Safety? Last time I did I got a “What the f***” from the electrician | ||
Investigate automatic shutdown options? | * Investigate automatic shutdown options? | ||
We now know the switch/jukebox UPS lasts a good length of time so the limiting factor of very short comms runtime (~90s) last time this was considered is no longer a problem. | ** We now know the switch/jukebox UPS lasts a good length of time so the limiting factor of very short comms runtime (~90s) last time this was considered is no longer a problem. | ||
Hopefully this’ll be of some use in the future. | Hopefully this’ll be of some use in the future. | ||