Changes

no edit summary
Line 2: Line 2:  
   |brief=We DDoS'd ourselves
 
   |brief=We DDoS'd ourselves
 
   |severity=Moderate
 
   |severity=Moderate
   |impact=Medium (Manual uploaded non-functional, one canceled show as a result - only a few seconds of dead air though)
+
   |impact=Medium (Manual uploaded non-functional, one canceled show as a result)
 
   |start=04/11/2019 09:00
 
   |start=04/11/2019 09:00
   |end=05/11/2019 02:00
+
   |end=05/11/2019 05:00
 
   |mitigation=Stop the self-DDoS, restart samba, cry
 
   |mitigation=Stop the self-DDoS, restart samba, cry
 
   |leader=Marks Polakovs (MP)
 
   |leader=Marks Polakovs (MP)
Line 11: Line 11:     
=URY is FUCKED (Frustratingly, URY Can't Keep Electronics Deferential)=
 
=URY is FUCKED (Frustratingly, URY Can't Keep Electronics Deferential)=
  −
'''This is still under construction because this literally just happened. When finished, I'll move this page into mainspace.'''
      
__TOC__
 
__TOC__
Line 26: Line 24:  
== Analysis ==
 
== Analysis ==
   −
This performance degradation is not new - looking at xymon graphs (specifically, CPU rolling average and I/O on urybackup0) suggests that the issue has been going on for many months. Anecdotally, presenters have reported that music upload performance has been slow. This could also be related to the jukebox-jokebox issues, as it may have trouble fetching song files over NFS.
+
This performance degradation is not new - looking at xymon graphs (specifically, CPU rolling average and I/O on urybackup0) suggests that the issue has been going on for many months. Anecdotally, presenters have reported that music upload performance has been slow. This could also be related to the jukebox-more-like-jokebox issues, as it may have trouble fetching song files over NFS.
    
What made it particularly noticeable this time was the ZFS backup job, which was the straw that broke the camel's back. AW speculates this could have made the RAID controllers I/O bound and unable to service requests in time.
 
What made it particularly noticeable this time was the ZFS backup job, which was the straw that broke the camel's back. AW speculates this could have made the RAID controllers I/O bound and unable to service requests in time.
Line 62: Line 60:  
== Long Term ==
 
== Long Term ==
   −
1. Upgrade Samba on urybackup0 to 4.10 - 4.6 is end-of-life
+
1. Upgrade Samba on urybackup0 to 4.10/4.11 - 4.6 is end-of-life (and has 10 CVEs)
    
:* May need to wait until after term-time, as this is a risky process
 
:* May need to wait until after term-time, as this is a risky process
Line 113: Line 111:     
Around 05:00 - the backup script finishes and network and I/O performance returns to normal.
 
Around 05:00 - the backup script finishes and network and I/O performance returns to normal.
 +
 +
[[Category: Incident Reports]]