Changes

Jump to navigation Jump to search
no edit summary
Line 2: Line 2:  
   |brief=A segmentation fault on our web server caused cascading failures on all URY Computing Services
 
   |brief=A segmentation fault on our web server caused cascading failures on all URY Computing Services
 
   |severity=Critical
 
   |severity=Critical
   |impact=High (Complete failure of URY Computing Services for 37+ minutes)
+
   |impact=High (Complete failure of many URY Computing Services for 37+ minutes)
 
   |start=02/02/2014 23:00
 
   |start=02/02/2014 23:00
 
   |end=02/02/2014 23:49
 
   |end=02/02/2014 23:49
Line 33: Line 33:  
* The segmentation fault that causes MyRadio to fail is still under investigation to identify the root cause.
 
* The segmentation fault that causes MyRadio to fail is still under investigation to identify the root cause.
 
* The stage of processing that MyRadio appears to be in at the time of segfault means that a database connection is opened, but does not get cleanly closed due to the crash. This left idle connections on the database which over time cause other systems to fail.
 
* The stage of processing that MyRadio appears to be in at the time of segfault means that a database connection is opened, but does not get cleanly closed due to the crash. This left idle connections on the database which over time cause other systems to fail.
* MyRadio currently runs as a super database user.
      
== Work Required ==
 
== Work Required ==
Line 41: Line 40:  
* Monitoring of system failures of this nature needs to be reviewed and improved, including automated reconnection of IRC bots and email reporting.
 
* Monitoring of system failures of this nature needs to be reviewed and improved, including automated reconnection of IRC bots and email reporting.
 
* A behaviour change of URY members is required to ensure that problems are reported through the correct channels. Lloyd Wallis is not a correct channel for reporting problems.
 
* A behaviour change of URY members is required to ensure that problems are reported through the correct channels. Lloyd Wallis is not a correct channel for reporting problems.
* MyRadio should not be configured to run as a database super user.
      
[[Category:Incident Reports]]
 
[[Category:Incident Reports]]

Navigation menu