I try to setup a multicluster with headless service support using linkerd stable-2.12.4, which should allow me address statefulset pods directly from the remote cluster.
Followed the docs hereI’m pretty sure the cluster connection worked before I linked the cluster with the headless option:
linkerd multicluster --cluster-name eu2 --set “enableHeadlessServices=true”
The linkerd + multicluster checks are all happy ( ) and I also see new endpoints being created in the remote cluster, when I scale up/down the statefulset in the remote cluster.But when I query the service or a specific pod, I get this error, even if DNS gets resolved to the local Endpoint IP:
< HTTP/1.1 504 Gateway Timeout < l5d-proxy-error: Gateway service in fail-fast
I labeled this service to be exported with a no
apiVersion: v1 kind: Service metadata: labels: mirror.linkerd.io/exported: "true" name: linkedapp-svc namespace: multiregion spec: clusterIP: None clusterIPs: - None internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - port: 8765 selector: app: linkedapp
Here is an example of the endpoints that can be resolved but throw the 504 on connection:
$ k get endpoints -n multiregion linkedapp-svc-eu2 -o yaml |k neatapiVersion: v1 kind: Endpoints metadata: annotations: mirror.linkerd.io/remote-gateway-identity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local mirror.linkerd.io/remote-svc-fq-name: linkedapp-svc.multiregion.svc.cluster.local labels: mirror.linkerd.io/cluster-name: eu2 mirror.linkerd.io/mirrored-service: "true" name: linkedapp-svc-eu2 namespace: multiregion subsets: - addresses: - hostname: linkedapp-1 ip: 10.100.125.234 - hostname: linkedapp-0 ip: 10.100.220.47 - hostname: linkedapp-2 ip: 10.100.29.158 ports: - port: 8765 protocol: TCP
All this is running between two AWS EKS clusters running with
v1.22.17 with the default AWS VPC CNI.
The classic NLB that Linkerd creates are running and accessible, I also let the recreate multiple times.
I tried to do a clean reinstall, recreate the NLB but it doesn’t change
I’ve found this discussion, but as this is a fresh installation without any but my test traffic, that shouldn’t be the problem, no?
Maybe a AWS CNI Issue?