We initiated a long-planned technical upgrade, which inadvertently generated excessive traffic to the APIs, causing some to become unavailable.
Attempts to revert the upgrade required API access, which was compromised as described above.
To break this cycle, we scaled up our resources and subsequently restarted the load balancer.
Ultimately, all services were restored and became fully operational.
Beginning at 06:41am CET, the following services experienced degradation, remaining available but with noticeable delays before eventually becoming completely unavailable:
transactional email sending
marketing email sending
transactional SMS sending
marketing SMS sending
contacts: customers could not create or update their contacts
webhooks: they were not triggered anymore
automation: workflows were not working anymore
All affected services were fully restored by 09:44am CET.
No data was lost during the incident, and any processes that experienced delays or interruptions resumed normal operation and successfully cleared their backlogs.
06:00AM CET: maintenance started
06:41AM CET: some APIs started to become unavailable
09:44AM CET: incident was fully resolved