Cloud Blog

WSO2 Cloud Incident Report: Jan 12, 2016

WSO2 Cloud experienced a serious service degradation on 12th January 2016: users were not able to login to the cloud for few hours.

Start time: 12th January 2016, 0833 PST
Recovery time: 12th January 2016, 1217 PST

Impact:

  • Users were not able to log into the cloud,
  • Sign-up was not working,
  • API Gateway was functioning throughout the incident serving API calls at normal performance level. There was only a 5 minute gateway downtime during database restart: http://uptime.cloud.wso2.com/

Root cause:

  • One of the housekeeping tasks running in our Identity Servers has failed due to a failure in acquiring a lock on a database table. This locked table is also responsible for storing the sessions and since it was locked, system was not able to complete the new user logins.
  • Since it was not possible to find out, which component was keeping the table locked, we had to restart the database server to get the system back on track.

Actions:

  • We have decreased the frequency of the aforementioned housekeeping tasks as advised by our Identity Server team.
  • We have also raised a support ticket with our internal support team to fix any possible future failures for this task.
  • We are investigating further to figure out which component had the table locked and to fix it.
  • We are looking into alerting and maintenance processes to ensure quicker resolution time in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

Recent Posts

Most Popular Posts

Twitter Facebook LinkedIn