Setup
We’re using an Azure Kubernetes Cluster (k8s version 1.29)
Helm “linkerd-control-plane” version 1.16.11 (App Version stable-2.14.10)
Ingress:
Ingress is managed via traefik, which has a fix IP (10.224.90.68)
Internet (e.g. test.com/target) → AKS → traefik → app-XY serving /target
No service except traefik is exposed directly outside the k8s-cluster. Ingress is only possible using ingress resources routing the traffic over traefik.
Our Network:
Entire network: 10.0.0.0/8
AKS worker nodes: 10.224.80.0/21
AKS cluster-nodes: 10.224.90.0/26
AKS internal-lb: 10.224.90.64/27
Every other address within 10.0.0.0/8 is either another cloud service or an on-premises address.
AKS Configuration:
Services Network: 10.0.0.0/16 (default)
Issue
Without Linkerd: None. Everything works perfectly. The kube-dns does resolve all addresses properly, all k8s services and on-prem addresses are reachable.
With Linkerd: Since the service-network of the AKS overlaps with the 10.0.0.0/8 network, egress traffic is not possible, if the target IP-Address happens to be the same as a service’s cluster-IP.
Example:
on-premises domain bar.onprem resolves to 10.0.121.215 (correct)
k8s Service “Foo” has a cluster-IP 10.0.121.215 assigned. (coincidence)
Application “Test” inside the k8s cluster, which is meshed by Linkerd, tries to access https://bar.onprem.
This scenario fails, as Linkerd routes the traffic to the k8s Service “Foo” which does not accept the request.
Why is it an issue
Yes, I could set the services-network to a reserved address space which does not overlap with other addresses. But this would require a redeployment of the AKS cluster as this can only be set at creation time. And it only does not work when using Linkerd. Without linkerd the AKS does not care about the overlapping spaces and can manage the traffic properly. Blocking another Subnet seems a bummer, as we are already short on address spaces.
What did I already try
I set the Helm parameter clusterNetworks to 10.224.80.0/21 (worker-nodes) in order to try to exempt the k8s services from the dns/meshing. But the result didn’t change. I’m unsure if the meshing even would still work properly doing this.
