Thread by @eastdakota, We had an outage at one of @Cloudflare's core data processing facilities [...]

Matthew Prince 🌥

eastdakota

We had an outage at one of @Cloudflare& #39;s core data processing facilities today. While our network and security services largely continued to run globally, our APIs and dashboard saw significant failures for the last few hours. 1/5

The outage was caused when planned maintenance decommissioned a rack of equipment was supposed to be redundant. While the equipment was redundant, there was a critical patch panel in the cabinet. Its removal caused multiple independent network connections to fail. 2/5

We have a backup facility for situations like this but chose not to cut over to it due both to some technical concerns as well as a belief we& #39;d be able to get the primary facility back online faster than it turned out we could. 3/5

Connectivity to the primary facility has been restored and our API and dashboards are coming back online. This never should have happened. 4/5

We are doing a full post mortem internally and will be publishing a writeup to our blog as soon as we have API and dashboard services fully restored and understand the mistakes we made. 5/5

You can follow @eastdakota.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: