Our critical systems, fault analysis and game days often come down to what we think we know about call flow, the call depth itself, and what's at the call leaves, such as data storage, but less often what actually can take us out.
Which configuration changes can total a system, and which configuration interactions.
Which codebase paths when exercised with unexpected input will crater.
Which error handlers don't do what we think they do
What givens are on the path (the ones you don't draw because, details, vendors) and what happens when they fail.
What are the things that should they fail, can't be rolled back and have a time to recovery problem.
Who are the people that can fix those somethings when they arise.
What are your unanticipated failure modes?
You can follow @dehora.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: