We had an outage at one of @Cloudflare's core data processing facilities today. While our network and security services largely continued to run globally, our APIs and dashboard saw significant failures for the last few hours. 1/5
The outage was caused when planned maintenance decommissioned a rack of equipment was supposed to be redundant. While the equipment was redundant, there was a critical patch panel in the cabinet. Its removal caused multiple independent network connections to fail. 2/5
We have a backup facility for situations like this but chose not to cut over to it due both to some technical concerns as well as a belief we'd be able to get the primary facility back online faster than it turned out we could. 3/5
Connectivity to the primary facility has been restored and our API and dashboards are coming back online. This never should have happened. 4/5
We are doing a full post mortem internally and will be publishing a writeup to our blog as soon as we have API and dashboard services fully restored and understand the mistakes we made. 5/5
You can follow @eastdakota.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: