What are failfast errors in Linkerd and how do you debug them?

william · April 11, 2023, 8:38pm

One common error message in Linkerd logs is around “failfast” (also called fail-fast or fail fast). What is failfast, and what does it mean for you?

Let’s find out.

What is failfast?

If you’re using Linkerd, you might encounter failfast in error messages like these:

outbound:accept{client.addr=172.16.26.21:42480}:ingress{addr=172.16.108.124:4444}:http{v=1.x}:override{dst=linkerd-communication.linkerd-dev.svc.cluster.local:80}: linkerd_stack::failfast: HTTP Logical service has become unavailable

outbound:accept{client.addr=172.30.1.53:50162}:proxy{addr=172.28.70.43:8080}:http{v=1.x}:logical{dst=some-service.namespace.svc.cluster.local:8080}:concrete{addr=some-service.namespace.svc.cluster.local:8080}: linkerd_stack::failfast: HTTP Balancer in failfast

outbound:accept{client.addr=10.244.5.133:33824}:proxy{addr=10.245.112.110:3306}: linkerd_stack::failfast: TCP Server service has become unavailable

inbound:accept{client.addr=10.244.5.133:33824}:proxy{addr=10.245.112.110:3306}:tcp: linkerd_stack::failfast: TCP Logical service has become unavailable

Failfast is a state that Linkerd enters when it is unable to reach a destination that it was asked to proxy a request to. Once it’s in the failfast, further requests to that destination are immediately returned as failures back to the caller, until such time as the destination actually becomes available.

Failfast is an implementation detail of the proxy. However, because it is often present in log messages when something is going wrong, we see many questions about what failfast actually is. So when debugging a failfast message, the most important thing to understand is that failfast it is not an error by itself. Failfast is a symptom of an underlying problem. If you see a failfast error, don’t blame Linkerd. Look at what Linkerd is trying to connect to—the problem is probably there!

Common ways that failfast happens

Failfast can happen in two ways.

First, failfast can happen on the inbound (server) side, when Linkerd is proxying an incoming request to the local app container. In this case, it means that Linkerd is unable to reach the app container on the given port. For example, if pod A tries to connect to meshed pod B on port 1234, and the application container in pod B doesn’t listen on port 1234, then the proxy in pod B would log a server-side failfast error for port 1234. It tried to connect to port 1234 on the local app container, but couldn’t, and future requests to port 1234 will be immediately failed by Linkerd. (Of course, Linkerd will periodically re-check whether port 1234 is open, and if the app container later opens that port, will leave fail-fast mode and proxy the connections as expected.)

The other way failfast can happen is on the outbound (client) side, when Linkerd is asked to proxy a request from the local app container to a destination somewhere else. For example, if meshed pod A wants to connect to port 1234 on pod B, and pod B is not listening on port 1234, then the proxy in pod A would log a client-side failfast error. More commonly, if meshed pod A is trying to connect to a service C on port 1234, and C has no endpoints available, then A will enter failfast.

Why is failfast necessary?

Failfast is really an optimization in Linkerd. Every time Linkerd tries to establish a connection, it sets a timeout on the operation (by default, 10 seconds). If Linkerd can’t connect within 10 seconds, it considers that a failure.

So failfast is simply there so that Linkerd (and your application) can avoid waiting for that timeout to occur any more than is strictly necessary.

Common situations that lead to failfast errors

The service you’re trying to connect to doesn’t select over any pods.
Every pod in the service has been evicted because they’re all failing liveness probes.
You’re trying to connect to a service on a port that it doesn’t use.

How do I debug failfast errors in Linkerd?

The failfast log line will typically include either inbound or outbound near the beginning of the string. This tells you whether it’s server-side (inboudn) or client-side (outbound).

For server-side failfast, there’s really only one problem: your application isn’t responding on the port Linkerd is trying to connect to it on. Figure out why.

For client-side failfast, you can use the linkerd diagnostics endpoints command to list out the endpoints that Linkerd is aware of for a given service. For example,

linkerd diagnostics endpoints emoji-svc.emojivoto.svc.cluster.local:8080 web-svc.emojivoto.svc.cluster.local:80

Hopefully this will give you a clue into why Linkerd is unable to connect to the destination.

Good luck and happy debugging!

Topic		Replies	Views
How do I know which endpoints Linkerd is aware of? (was: Fail-Fast troubleshooting) Linkerd General Discussion	1	871	April 11, 2023
Linkerd Destination Scaling Issues Linkerd General Discussion	2	642	February 14, 2024
Meshed pods fail to connect to unmeshed smtp/redis pods with Linkerd 2.14.2 Linkerd General Discussion	10	1649	November 9, 2023
Linkerd fails to install, no running pods for linkderd-distination Linkerd General Discussion	2	18	June 20, 2025
Sending fatal alert HandshakeFailure Linkerd General Discussion proxy , configuration	4	655	August 23, 2023