Resilience
In the previous post, we discussed why the imponderabilities of distributed systems will hit us at the application level and we cannot leave their handling to the operations teams as we did in the past.
In the previous post, we discussed availability, and how more nodes and the effects of remote communication affect it negatively. We learned that failures in today’s distributed, highly interconnected system landscapes are unavoidable and that we need to embrace them if we want to create highly available solutions.
In the previous post, we discussed what distributed systems mean in terms of failure modes that can occur and what their concrete consequences are regarding application behavior.
In the previous, introductory post why we need resilient software design, we discussed the stepwise journey from isolated monolithic applications to distributed system landscapes where applications continually communicate with each other. We also discussed that the number of peers involved continually grew (and still grows), while the update propagation duration expectations became shorter and the availability expectations went to…
In this post series, I will discuss what resilience means for IT systems and why resilient software design has become mandatory.