Incident Report – January 29-30, 2013
Between 12:44 a.m. and 2:31 p.m. EST on Jan. 29, and again on Jan. 30 between 10:05 a.m. and 1:32 p.m. EST, parts of our service were sporadically unavailable as our server capacity was overloaded due to unusually high traffic.
During that time, respondents experienced slow form loading and many failed submissions due to timeouts. Affected customers were users on our Starter, Pay-As-You-Go, Basic, and Professional plans. Customers on our Enterprise Plan were not affected.
The incident on the 30th was more severe and sustained than the one on the previous day (Uptime Report).
Following the incident on the 29th, our first effort went into fixing a server configuration issue: the server was accepting more concurrent connections than it could handle, and ended up saturating RAM and CPU resources. The heavy traffic reoccurred on the 30th before we had time to make further progress.
On the 30th, after spending some time monitoring traffic and server capacity, we decided to bring our standby server online and redirect some traffic to that server. This returned us to normal operations at around 1:30 p.m.
Overall, we can’t be too satisfied with how things went. The traffic spike caught us with our guard down and we took too long to bring additional capacity online.
Some of the issues that we’re going to address immediately:
- More RAM. The memory profile of each HTTP request depends on what each form is configured to do. File uploads or Salesforce integration, for instance, consume more memory than other requests. When traffic is high, this can tip a server over capacity, so we need to account for this more carefully.
- Add an additional spare server to handle peak load. We’ll make an announcement when the server is ready. Customers who use IP whitelisting in Salesforce (or other services) will need to update their configuration accordingly.
Beyond this, we will continue to work on further infrastructure improvements. We’re fortunate to have a growing customer base, and it’s our responsibility to make sure we deliver a top class service.
You can always check our service status at https://status.formassembly.com or follow @FormAssembly on Twitter for updates.
We apologize for the downtime and the inconvenience caused. We’re incredibly grateful for your patience and support. Please let us know if you have any feedback or comments regarding this issue.
The FormAssembly Team