Stable-2.12.4 - Multicluster Headless Connection "504 Gateway Timeout"

seb · May 3, 2023, 2:47pm

Hey,
I try to setup a multicluster with headless service support using linkerd stable-2.12.4, which should allow me address statefulset pods directly from the remote cluster.

Followed the docs hereI’m pretty sure the cluster connection worked before I linked the cluster with the headless option:

linkerd multicluster --cluster-name eu2 --set “enableHeadlessServices=true”

The linkerd + multicluster checks are all happy ( ) and I also see new endpoints being created in the remote cluster, when I scale up/down the statefulset in the remote cluster.But when I query the service or a specific pod, I get this error, even if DNS gets resolved to the local Endpoint IP:

< HTTP/1.1 504 Gateway Timeout
< l5d-proxy-error: Gateway service in fail-fast

I labeled this service to be exported with a no ClusterIP:

apiVersion: v1
kind: Service
metadata:
  labels:
    mirror.linkerd.io/exported: "true"
  name: linkedapp-svc
  namespace: multiregion
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - port: 8765
  selector:
    app: linkedapp

Here is an example of the endpoints that can be resolved but throw the 504 on connection:

$ k get endpoints -n multiregion linkedapp-svc-eu2 -o yaml |k neatapiVersion: v1
kind: Endpoints
metadata:
  annotations:
    mirror.linkerd.io/remote-gateway-identity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
    mirror.linkerd.io/remote-svc-fq-name: linkedapp-svc.multiregion.svc.cluster.local
  labels:
    mirror.linkerd.io/cluster-name: eu2
    mirror.linkerd.io/mirrored-service: "true"
  name: linkedapp-svc-eu2
  namespace: multiregion
subsets:
- addresses:
  - hostname: linkedapp-1
    ip: 10.100.125.234
  - hostname: linkedapp-0
    ip: 10.100.220.47
  - hostname: linkedapp-2
    ip: 10.100.29.158
  ports:
  - port: 8765
    protocol: TCP

All this is running between two AWS EKS clusters running with v1.22.17 with the default AWS VPC CNI.
The classic NLB that Linkerd creates are running and accessible, I also let the recreate multiple times.
I tried to do a clean reinstall, recreate the NLB but it doesn’t change

I’ve found this discussion, but as this is a fresh installation without any but my test traffic, that shouldn’t be the problem, no?

Any hints?
Maybe a AWS CNI Issue?

Alex · May 3, 2023, 6:59pm

Hi @seb

Are the requests that you’re making from the source cluster being made from inside a meshed pod? Because all multicluster traffic needs to be encrypted with mTLS, only meshed pods will be able to query the mirror services.

I’d also recommend taking a look at the linkerd-gateway logs in the target cluster. These logs may give you some clue as to why the gateway service is in fail-fast.

seb · May 4, 2023, 6:11am

Hi @Alex ,
thanks for your response!
Yes both clusters have an identical namespace (“multiregion” in that case) and inject Linkerd to all pods within that namespace.

I’ll rebuild everything from scratch again and also have a look on the gateway logs. Keep you posted !

seb · May 10, 2023, 2:59pm

Hey @Alex ,

Just spend some more time on this topic, wiped again the complete installation and reinstalled it. But I got the same Issue.

Checking the linkerd-gateway logs in the target cluster I get the same issues as described here:

INFO ThreadId(01) inbound: linkerd_app_core::serve: Connection closed error=direct connections must be mutually authenticated error.sources=[direct connections must be mutually authenticated] client.addr=192.168.100.175:50015

Any more ideas?
I tried to fix the Health Probes on the AWS LB but it doesn’t help so far…

EDIT: Talking to the headless services is now working, finally
In the end I had to restart all involved pods and now it’s working.

Anyhow, I still get the error above in the gateway. Even if they’re not blocking for Multicluster Communication, I’d still like to solve them. Is there a fix for the LB Health Probes, ready to be applied?

Alex · May 10, 2023, 6:16pm

Glad to hear that you got multicluster traffic working!!!

As for the errors in the logs, we have an open issue tracking this (Linkerd Gateway logs spammed with "connections must be mutually authenticated" from kube-system probes · Issue #10203 · linkerd/linkerd2 · GitHub)

Topic		Replies	Views
[Multicluster] Link failure Linkerd General Discussion	2	1119	June 29, 2023
Linkerd multicluster: probe-gateway mirrored from cluster X has no endpoints Linkerd General Discussion	15	249	August 1, 2024
Services cannot talk to mirrored services Linkerd General Discussion configuration	4	21	July 15, 2025
Linkerd Multicluster Doubts Linkerd General Discussion configuration	1	49	March 25, 2025
Meshed pods fail to connect to unmeshed smtp/redis pods with Linkerd 2.14.2 Linkerd General Discussion	10	1666	November 9, 2023

Stable-2.12.4 - Multicluster Headless Connection "504 Gateway Timeout"

Related topics