In the previous post, we discussed what distributed systems mean in terms of failure modes that can occur and what their concrete consequences are regarding application behavior.
In the previous, introductory post why we need resilient software design, we discussed the stepwise journey from isolated monolithic applications to distributed system landscapes where applications continually communicate with each other. We also discussed that the number of peers involved continually grew (and still grows), while the update propagation duration expectations became shorter and the availability expectations went to…
In this post series, I will discuss what resilience means for IT systems and why resilient software design has become mandatory.
In this post, I will discuss if there is a difference between resilience and fault tolerance when talking about IT systems.
Resilience IMO is a huge topic. It has a broader scope than most people think and – even more important – it has become much more relevant for most of us than most people imagine. Thus, time to shed some light on this topic. Over the course of several posts in the future I will discuss several aspects of resilience. Of course, my main focus will still be IT. But in some places I will leave the boundaries of IT as this topic affects – and supports – us in many ways.