Hey,
I try to setup a multicluster with headless service support using linkerd stable-2.12.4, which should allow me address statefulset pods directly from the remote cluster.
Followed the docs hereI’m pretty sure the cluster connection worked before I linked the cluster with the headless option:
linkerd multicluster --cluster-name eu2 --set “enableHeadlessServices=true”
The linkerd + multicluster checks are all happy ( ) and I also see new endpoints being created in the remote cluster, when I scale up/down the statefulset in the remote cluster.But when I query the service or a specific pod, I get this error, even if DNS gets resolved to the local Endpoint IP:
< HTTP/1.1 504 Gateway Timeout
< l5d-proxy-error: Gateway service in fail-fast
I labeled this service to be exported with a no ClusterIP
:
apiVersion: v1
kind: Service
metadata:
labels:
mirror.linkerd.io/exported: "true"
name: linkedapp-svc
namespace: multiregion
spec:
clusterIP: None
clusterIPs:
- None
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- port: 8765
selector:
app: linkedapp
Here is an example of the endpoints that can be resolved but throw the 504 on connection:
$ k get endpoints -n multiregion linkedapp-svc-eu2 -o yaml |k neatapiVersion: v1
kind: Endpoints
metadata:
annotations:
mirror.linkerd.io/remote-gateway-identity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
mirror.linkerd.io/remote-svc-fq-name: linkedapp-svc.multiregion.svc.cluster.local
labels:
mirror.linkerd.io/cluster-name: eu2
mirror.linkerd.io/mirrored-service: "true"
name: linkedapp-svc-eu2
namespace: multiregion
subsets:
- addresses:
- hostname: linkedapp-1
ip: 10.100.125.234
- hostname: linkedapp-0
ip: 10.100.220.47
- hostname: linkedapp-2
ip: 10.100.29.158
ports:
- port: 8765
protocol: TCP
All this is running between two AWS EKS clusters running with v1.22.17
with the default AWS VPC CNI.
The classic NLB that Linkerd creates are running and accessible, I also let the recreate multiple times.
I tried to do a clean reinstall, recreate the NLB but it doesn’t change
I’ve found this discussion, but as this is a fresh installation without any but my test traffic, that shouldn’t be the problem, no?
Any hints?
Maybe a AWS CNI Issue?