Network address issue using Linkerd in AKS

Setup
We’re using an Azure Kubernetes Cluster (k8s version 1.29)
Helm “linkerd-control-plane” version 1.16.11 (App Version stable-2.14.10)

Ingress:
Ingress is managed via traefik, which has a fix IP (10.224.90.68)
Internet (e.g. test.com/target) → AKS → traefik → app-XY serving /target
No service except traefik is exposed directly outside the k8s-cluster. Ingress is only possible using ingress resources routing the traffic over traefik.

Our Network:
Entire network: 10.0.0.0/8
AKS worker nodes: 10.224.80.0/21
AKS cluster-nodes: 10.224.90.0/26
AKS internal-lb: 10.224.90.64/27
Every other address within 10.0.0.0/8 is either another cloud service or an on-premises address.

AKS Configuration:
Services Network: 10.0.0.0/16 (default)

Issue
Without Linkerd: None. Everything works perfectly. The kube-dns does resolve all addresses properly, all k8s services and on-prem addresses are reachable.

With Linkerd: Since the service-network of the AKS overlaps with the 10.0.0.0/8 network, egress traffic is not possible, if the target IP-Address happens to be the same as a service’s cluster-IP.

Example:
on-premises domain bar.onprem resolves to 10.0.121.215 (correct)
k8s Service “Foo” has a cluster-IP 10.0.121.215 assigned. (coincidence)
Application “Test” inside the k8s cluster, which is meshed by Linkerd, tries to access https://bar.onprem.
This scenario fails, as Linkerd routes the traffic to the k8s Service “Foo” which does not accept the request.

Why is it an issue
Yes, I could set the services-network to a reserved address space which does not overlap with other addresses. But this would require a redeployment of the AKS cluster as this can only be set at creation time. And it only does not work when using Linkerd. Without linkerd the AKS does not care about the overlapping spaces and can manage the traffic properly. Blocking another Subnet seems a bummer, as we are already short on address spaces.

What did I already try
I set the Helm parameter clusterNetworks to 10.224.80.0/21 (worker-nodes) in order to try to exempt the k8s services from the dns/meshing. But the result didn’t change. I’m unsure if the meshing even would still work properly doing this.

Hi @Niklas ,

Hopefully understanding the way Linkerd routes traffic can help us figure out a solution. Lets start with a high level overview of Linkerd’s outbound call routing:

Linkerd’s handling of an outbound call will depend on whether the client is dialing by svc fqdn or a direct endpoint.

If the client is making a call to an fqdn/svc, the client will:

  1. Make a DNS query to the local kubernetes DNS resolver and obtain the in-cluster service IP. 2. Once it has the IP the client will attempt to initiate a connection to that IP, but the connection will be intercepted by Linkerd.
  2. At this point Linkerd will check with the destination service if it’s a known and valid IP, if it resolves to a SVC or an endpoint, and make some additional checks to make sure we know where we’re routing the traffic.
  3. If it’s a svc IP (like a clusterIP) Linkerd checks a list of all known endpoints associated with that service and then picks the best one to initiate a connection to based on the internal LB algorithm (by default it’ll be EWMA).

If the client is making a call to an fqdn that resolves directly to an endpoint (pod), or a Pod IP, the client will go through the same kubernetes DNS lookup and request initiation process, however, Linkerd will confirm that it’s a direct endpoint IP during the destination lookup phase, and then will directly route to this endpoint without making any additional load balancing decisions (the idea is to respect the client’s choice of endpoint as this is valid in certain cases, especially when dealing with Ingresses).

The recommended way to resolve this would be to either fix the overlapping subnets, or amend the kubernetes DNS config to differentiate between external/internal addresses and set the priority for resolution (e.g., tell kubernetes DNS to ignore in-cluster addresses for certain overlapping external subnets), though note this may cause issues when you actually do need to route to those services internally and they have an addr overlap.