General
I recently had a short discussion with a Product Owner after a talk I gave about resilient software design. His question was: “How can I motivate developers to care more about resilience?”
I briefly mentioned the 100% availability trap in a prior post. As this misconception is so widespread, I decided to discuss it in more detail in this post.
In the previous post, we discussed why the imponderabilities of distributed systems will hit us at the application level and we cannot leave their handling to the operations teams as we did in the past.
In the previous post, we discussed availability, and how more nodes and the effects of remote communication affect it negatively. We learned that failures in today’s distributed, highly interconnected system landscapes are unavoidable and that we need to embrace them if we want to create highly available solutions.
In the previous post, we discussed what distributed systems mean in terms of failure modes that can occur and what their concrete consequences are regarding application behavior.