new
Hotfix!
Backend updates
2.47.02: Migrate Tokening Service
This morning, planned updates to MongoDB Atlas resulted in calls to our tokening service timing out. MongoDB, in their infinite generosity, will not share an RCA of the outage unless we move to the next billing tier. Regardless of what in their system caused the outage, the result was the same - downtime for our API and an accelerated effort to get the heck off MongoDB.
Once we were aware of the issue, we refactored the calls which were dependent on Atlas and redeployed. Total downtime was approximately 68 minutes.
Actions Taken
In this update, we move the entirety of our tokening and request increment workloads to an AWS-managed service which plays very nicely with our Lambda functions.
We have load tested this service extensively today and the results are stellar: low-latency and high reliability even at extraordinary call volumes. Additionally, we are mitigating against issues caused by timeouts by implementing retry and failovers in our tokening workloads. In load testing with up to 20 virtual users simultaneously sending high call volumes, we achieve:
- P90: 496 ms
- P95: 510 ms
- P99: 699 ms