Incident Report: 20191104: Difference between revisions
Updates per AW and IL |
No edit summary |
||
| (6 intermediate revisions by the same user not shown) | |||
| Line 2: | Line 2: | ||
|brief=We DDoS'd ourselves | |brief=We DDoS'd ourselves | ||
|severity=Moderate | |severity=Moderate | ||
|impact=Medium (Manual uploaded non-functional, one canceled show as a result | |impact=Medium (Manual uploaded non-functional, one canceled show as a result) | ||
|start=04/11/2019 09:00 | |start=04/11/2019 09:00 | ||
|end=05/11/2019 | |end=05/11/2019 05:00 | ||
|mitigation=Stop the self-DDoS, restart samba, cry | |mitigation=Stop the self-DDoS, restart samba, cry | ||
|leader=Marks Polakovs (MP) | |leader=Marks Polakovs (MP) | ||
| Line 11: | Line 11: | ||
=URY is FUCKED (Frustratingly, URY Can't Keep Electronics Deferential)= | =URY is FUCKED (Frustratingly, URY Can't Keep Electronics Deferential)= | ||
__TOC__ | __TOC__ | ||
| Line 26: | Line 24: | ||
== Analysis == | == Analysis == | ||
This performance degradation is not new - looking at xymon graphs (specifically, CPU rolling average and I/O on urybackup0) suggests that the issue has been going on for many months. Anecdotally, presenters have reported that music upload performance has been slow. This could also be related to the jukebox-jokebox issues, as it may have trouble fetching song files over NFS. | This performance degradation is not new - looking at xymon graphs (specifically, CPU rolling average and I/O on urybackup0) suggests that the issue has been going on for many months. Anecdotally, presenters have reported that music upload performance has been slow. This could also be related to the jukebox-more-like-jokebox issues, as it may have trouble fetching song files over NFS. | ||
What made it particularly noticeable this time was the ZFS backup job, which was the straw that broke the camel's back. AW speculates this could have made the RAID controllers I/O bound and unable to service requests in time. | What made it particularly noticeable this time was the ZFS backup job, which was the straw that broke the camel's back. AW speculates this could have made the RAID controllers I/O bound and unable to service requests in time. | ||
| Line 62: | Line 60: | ||
== Long Term == | == Long Term == | ||
1. Upgrade Samba on urybackup0 to 4.10 - 4.6 is end-of-life | 1. Upgrade Samba on urybackup0 to 4.10/4.11 - 4.6 is end-of-life (and has 10 CVEs) | ||
:* May need to wait until after term-time, as this is a risky process | :* May need to wait until after term-time, as this is a risky process | ||
| Line 113: | Line 111: | ||
Around 05:00 - the backup script finishes and network and I/O performance returns to normal. | Around 05:00 - the backup script finishes and network and I/O performance returns to normal. | ||
[[Category: Incident Reports]] | |||