January 13, 2016
3 min read

WSO2 Cloud Incident Report: Jan 12, 2016

WSO2 Cloud experienced a serious service degradation on 12th January 2016: users were not able to login to the cloud for few hours. Start time: 12th January 2016, 0833 PST Recovery time: 12th January 2016, 1217 PST Impact:
  • Users were not able to log into the cloud,
  • Sign-up was not working,
  • API Gateway was functioning throughout the incident serving API calls at normal performance level. There was only a 5 minute gateway downtime during database restart:
Root cause:
  • One of the housekeeping tasks running in our Identity Servers has failed due to a failure in acquiring a lock on a database table. This locked table is also responsible for storing the sessions and since it was locked, system was not able to complete the new user logins.
  • Since it was not possible to find out, which component was keeping the table locked, we had to restart the database server to get the system back on track.
  • We have decreased the frequency of the aforementioned housekeeping tasks as advised by our Identity Server team.
  • We have also raised a support ticket with our internal support team to fix any possible future failures for this task.
  • We are investigating further to figure out which component had the table locked and to fix it.
  • We are looking into alerting and maintenance processes to ensure quicker resolution time in the future.