Brevo

Write-up

Issue with contact's API V3 endpoints

Summary

On Friday, July 18th, between 19:17 UTC and 21:20 UTC, several users experienced failures when using Contacts API V3 endpoints. This affected the ability to import contacts and manage contact-related operations through the public API V3.

Additionally, between 20:40 UTC and 21:20 UTC, all API endpoints were unavailable. Users attempting to access the Brevo external API during this time received errors when making their requests.

The root issue stemmed from a configuration problem during internal infrastructure changes, which impacted how the service connected to its underlying data layer.

The issue has been resolved, and all services are now operating normally.

What Was Impacted

During the incident window, users may have experienced:

Errors while attempting to import contacts
Inability to create or delete contact lists
Failures when performing certain contact management operations via API

Contact creation and updates remained functional throughout the incident. No data was lost, and the platform's core functionality was otherwise stable.

Root Cause

The issue began during an infrastructure migration, when a misconfiguration caused the application to lose connection with key backend systems. As a result, the contact-related API endpoints stopped working, and users were unable to perform certain contact operations.

Automatic safety checks then blocked access to these services while the issue was being diagnosed.

Resolution

The engineering team took the following steps to restore service:

Identified and isolated the misconfiguration affecting backend connectivity
Adjusted system health checks to allow traffic and support deeper diagnostics
Validated connectivity to internal data stores
Deployed a corrected version of the service and monitored traffic flow until full recovery

Next Steps

To prevent similar incidents in the future, we are taking the following actions:

Improve testing for configuration changes, especially those involving critical infrastructure like caching and data stores
Review and enhance service health-check strategies to avoid unnecessary traffic disruption
Enhance our existing observability and automated detection systems to more proactively identify misconfigurations and integration issues before they reach production

We deeply regret the disruption this caused and recognize the trust you place in us. We are committed to learning from this incident and strengthening our processes to ensure platform reliability.

Thank you for your continued support and understanding.