Don't let your judgement be clouded

In Buildkite’s Aug 22 outage post-mortem, I read the (refreshingly honest and transparent) story of a cascading set of failures with a root cause of poor change-management. An interesting takeaway from the story is that a large amount of the failures are unique to the “cloud” / IaaS platform design.

This is a companion discussion topic for the original entry at