In the previous post, we discussed availability, and how more nodes and the effects of remote communication affect it negatively. We learned that failures in today’s distributed, highly interconnected system landscapes are unavoidable and that we need to embrace them if we want to create highly available solutions.
In the previous post, we discussed what distributed systems mean in terms of failure modes that can occur and what their concrete consequences are regarding application behavior.
In the previous, introductory post why we need resilient software design, we discussed the stepwise journey from isolated monolithic applications to distributed system landscapes where applications continually communicate with each other. We also discussed that the number of peers involved continually grew (and still grows), while the update propagation duration expectations became shorter and the availability expectations went to…
In this post series, I will discuss what resilience means for IT systems and why resilient software design has become mandatory.
In this post, I will discuss if there is a difference between resilience and fault tolerance when talking about IT systems.