(Un)coupling in distributed systems - Part 1

Understanding functional coupling

Uwe Friedrichsen

10 minute read

Rock covered by clams

(Un)coupling in distributed systems - Part 1

Coupling is a big issue in software design. With software landscapes becoming more and more complex, coupling painfully steps on our toes whenever we attempt to change things. Hence, we want to reduce coupling. On the other hand, without any coupling systems and their parts would not be able to interact. Hence, we need coupling – feels a bit like being stuck between a rock and a hard place.

This is not a new discussion. It is at least 50 years old. Just to name a few examples:

  • Stevens, Myers and Constantine already discussed the topic 1974 in their seminal paper “Structured design” 1 where they also introduced the timeless design principle of “high cohesion, low (loose) coupling”.
  • You could fill whole books with the discussions around coupling – and actually whole books have been filled, like, e.g., the recently published book “Balancing Coupling in Software Design” by Vlad Khononov.
  • And of course you repeatedly find presentations discussing the topic, like, e.g., the very recommendable talk “Uncoupling” by Michael Nygard (see, e.g., the GOTO 2018 recording).

This is just the tip of the iceberg. Hence, there is a lot we could discuss regarding coupling. We could discuss essential versus accidental coupling, i.e., the coupling we need to get the job done versus all the other coupling we accidentally introduce without any added value. We could discuss the types of coupling, the “Structured design” paper discusses. Or the types of coupling, connascence discusses. Or the distinctions, Michael Nygard makes in his talk. Or the different facets of coupling, Vlad Khononov discusses in his book. Or … the options are legion.

However, my aspiration is not to discuss the topic in general. This would result in another book, I am afraid (and we already have Vlad’s book). Instead, I would like to discuss a specific aspect of coupling in the context of remote communication that is quite poorly understood based on my observations.

To be precise, I would like to discuss what it needs to have two processes 2, like, e.g., (micro)services loosely coupled.

Technical coupling

Whenever this question comes up, the standard answer is “messaging”: Use messaging as communication style between the two processes – no matter if command-based, event-based or document-based – et voilĂ : We are loosely coupled.

Well, are we?

The answer to this question is not as simple as most people think it is based on my observations.

Switching from a synchronous request-response-based communication style to an asynchronous message-based communication style definitely is a step towards loose coupling. But it is not enough. Going from synchronous request-response-based communication to asynchronous message-based communication is just a reduction of technical coupling.

Technical coupling is about the degree of (un)coupling based on technical means, here what you can achieve by replacing one technical communication style by another.

The other relevant aspects of (un)coupling in this context are still missing – and based on my observations they are ignored most of the time. Thus, let us have a look at them.

Functional coupling

When it comes to coupling between processes, functional coupling is probably a lot more important than technical coupling. Or to rephrase it even stronger:

Unless we ensure loose functional coupling, caring about technical coupling is pointless.

This raises the question: What exactly do I mean with “functional coupling”?

Using a non-formal definition, functional coupling is the degree how much processes depend on each other at a functional, i.e., business logic level to get their respective jobs done.

Let us illustrate this concept using a simple example: Imagine an eCommerce solution. Part of that solution is letting the customer search for products. Search in an eCommerce solution usually contains a lot more functionality than just querying a product catalog. We want to show customized prices, depending on the customer, on ongoing campaigns, etcetera. We want to add sponsored products to the search result list. And more. Hence, we decide to provide a search service that encapsulates all the search magic. It provides a simple search API and takes care of all the intricacies under the hood.

To do its work, the search service requires more information, like, e.g., the product catalog, the customized product prices, information about the customer, information about ongoing campaigns and sponsoring, and so on. As this information is not only required by the search service but also at different places, we encapsulate it in services on their own: A product service, a pricing service, a customer service, a campaign service, a sponsoring service, etcetera.

From a design perspective, we just did what we have learned about good design: Separate independent concerns and encapsulate them in different building blocks. Additionally, if a functionality is used in multiple places, encapsulate the functionality in a separate building block and provide it to the using building blocks. As our building blocks are services, we encapsulate the different concerns in services. Straightforward application of design best practices. So far, so good.

However, if we look at the runtime properties of the search service, we make an unpleasant discovery: Due to the way, the functionality is spread across the services, the search service can fulfill its job only if all the other services, it needs to access are available while it processes an external request.

In other words: At a functional level, the search service is tightly coupled to the services, it uses. It cannot fulfill its job without the functionality provided by the other services it depends on.

This is what functional coupling in the context of distributed systems is about: Can a process A still do its job if another process B is not available?

  • If yes, process A is loosely coupled to service B at a functional level.
  • If no, process A is tightly coupled to service B at a functional level.

Of course, functional coupling is not binary, either loosely or tightly coupled. Rather, it is a spectrum. Often, we implement mitigation strategies to reduce the functional coupling. In our example, the search service could use the default article price included in the product catalog if the pricing service fails to deliver custom prices in time. It could also just skip campaign pricing and sponsored articles if the respective services fail to respond in time. This way, a service can reduce a tight functional coupling. 3

However, this is still not loose functional coupling because the reduction of coupling also results in a reduced service level at a functional level: The customer does not get the probably lower campaign prices, and sponsored articles are not shown in the search results even though suppliers paid for it. We can mitigate the undesirable effects of tight functional coupling between processes but we cannot completely eradicate them and magically turn a tight functional coupling in a loose one.

But how can applying design best practices lead to tight coupling and such an undesired runtime behavior? Again, there is a lot, we could discuss here but the key point is that the usual design best practices do not take remote communication into account. They are based on the implicit assumption that calls to other building blocks do not take any notable time and that they never fail. This is true for in-process communication and most design best practices stem from this context. 4

However, remote communication is probabilistic: Remote calls can highly vary regarding duration up to taking forever. And they can fail, i.e., return a technical error because the communication failed, e.g., due to a temporary network partition. The usual design best practices do not take these properties of remote calls into account which leads to the aforementioned effects if you apply them to the design of processes that communicate via remote communication.

Note that all this completely independent from the technical communication style we use:

  • In our example, a customer issues a search request via a frontend or another system.
  • The search service needs to send back the search results within a short period of time to not put off the customer.
  • Before it can send back the search results, the search service needs to access all the other services and get the required data from them. This is due to the tight functional coupling.

This means, the search service needs to access all the other services and retrieve the required data and do its own work and send back the search results before the customer gets dissatisfied with the experience.

All this has nothing to do with the technical communication style we use. Going from request-response-based communication to message-based communication does not change anything. The constraints and consequences stay the same, no matter which style we use.

This is due to our functional design, how we organize the functionality across the different services, i.e., processes. If this results in high coupling at the functional level (as in our example), all loose coupling at the technical level does not help – which brings us back to the statement in the beginning of this section: Unless we ensure loose functional coupling, caring about technical coupling is pointless.

In other words:

We need to ensure loose coupling at both levels, the technical and the functional level to actually achieve loose coupling.

Going for message-based communication without making sure, we are loosely coupled at a functional level is just adding accidental complexity: We have more moving parts in our infrastructure. We need to deal with new failure modes. Reasoning about the system state becomes harder. And we are still not loosely coupled.

Summing up

We have seen that coupling is a big issue in software design. We typically need some degree of coupling to get a job done but tight coupling has a series of drawbacks. Therefore, we try to keep the coupling between system parts as low as possible and especially try to avoid accidental coupling, i.e., coupling that is not needed to get the job done.

Here, we focused on a specific type of coupling, the coupling between processes in a distributed system and started to discuss it:

  • We discussed the fallacy that loose technical coupling, i.e., using a message-based communication style is sufficient to ensure loose coupling between processes.
  • We have seen that loose technical coupling is pointless without loose functional coupling.
  • We learnt we need to implement loose coupling at a technical and a functional level to actually become loosely coupled.

In the second (and last) post of this little blog series (link will follow), we will discuss the redundancy fallacy and the 3rd type of coupling, we need to consider in the context of remote communication, which is temporal coupling. We will see, how it can support us to achieve loose coupling and how it gives us additional design options. Stay tuned … ;)


  1. W. P. Stevens, G. J. Myers, and L. L. Constantine, “Structured design”, IBM SYSTEMS JOURNAL, VOL13, NO 2, 1974, see, e.g., https://www.academia.edu/58429322/Structured_design ↩︎

  2. In this context, a “process” means a separate technical runtime unit, usually managed by the underlying OS. It does not mean a process in the meaning of a business process, like it is used, e.g., in the context of process engines. ↩︎

  3. Note that such measures to reduce tight coupling at a functional level are all business-level decisions. A developer cannot simply decide at implementation time that custom or campaign prices are ignored or that sponsored articles are not shown, if a technical problem should occur (i.e., the other process returns a technical error or does not respond in time). It requires a business owner to make such a decision. This is probably the core reason why functional coupling is so often ignored by IT people. ↩︎

  4. To be precise, a called building block inside the same process context can also fail at a technical level while it is executed. But this means that the process fails, it is part of. As the calling building block is part of the same process context, it dies with the failing called building block, i.e., any logic to handle the failure of the called building block will never be executed. The encompassing process just died – caller and callee or both dead. Therefore, it is not required to handle scenarios where a building block inside the same process context fails. This means we can act as if called building blocks never fail. ↩︎