Pugpig Distribution Service Down
Incident Report for Pugpig
Resolved
This morning from 9:11 UTC to about 10:10 UTC, the Pugpig Distribution Service failed following a deployment of an update.

After the failure to start, our engineers investigated, and discovered this was because we were hitting a rate limit on one of our AWS Services (specifically the AWS SSM Parameter Store). This was most likely a chance alignment of an unusually busy set of deployments for multiple environments which lead to automatic re-starts making it more likely to hit the rate limit.

In order to restore the service, we forcibly stopped all automatic re-starts, then re-started the distribution production service manually.

During the outage, the Distribution Service was unavailable, and content updates would have stalled.

We're currently implementing improved designs to avoid this issue in the future.
Posted Oct 02, 2023 - 10:10 UTC