Linkerd-enterprise-windows-cni helm chart does not support tolerations

Jan · November 18, 2025, 12:58pm

The linkerd-enterprise-windows-cni helm chart does not support tolerations. The normal use-case as I see it is to install this on the Windows nodes in a k8s cluster that has both Linux and Windows nodes. The normal pattern when using windows nodes is to always taint the windows nodes with:

  taints:
  - effect: NoSchedule
    key: kubernetes.io/os
    value: windows

But the linkerd-enterprise-windows-cni helm chart does not have any default tolerations nor does it support adding tolerations

Jan · November 18, 2025, 1:05pm

A default toleration could maybe be:

tolerations:
  - effect: NoSchedule
    operator: Exists

Jan · November 18, 2025, 1:32pm

Manually hacked it to test while waiting. Next problem is that we have already upgraded control plane to 2.19.1. And the proxy-win image does not exist for that version. Can you please fix that also?

Jan · November 18, 2025, 1:52pm

More issues. windows-cni is running on windows nodes:

time="2025-11-18T13:03:31Z" level=info msg="Starting Linkerd Windows CNI plugin installation..."
time="2025-11-18T13:03:31Z" level=debug msg="Configuration:"
time="2025-11-18T13:03:31Z" level=debug msg="  Inbound Proxy Port: 4143"
time="2025-11-18T13:03:31Z" level=debug msg="  Outbound Proxy Port: 4140"
time="2025-11-18T13:03:31Z" level=debug msg="  Proxy User ID: S-1-5-20"
time="2025-11-18T13:03:31Z" level=debug msg="  Inbound Ports to Ignore: [4193 4192]"
time="2025-11-18T13:03:31Z" level=debug msg="  Outbound Ports to Ignore: []"
time="2025-11-18T13:03:31Z" level=debug msg="  CNI Config Directory: C:\\k\\azurecni\\netconf"
time="2025-11-18T13:03:31Z" level=debug msg="  CNI Bin Directory: C:\\k\\azurecni\\bin"
time="2025-11-18T13:03:31Z" level=debug msg="  Log Format: plain"
time="2025-11-18T13:03:31Z" level=debug msg="  Log Level: debug"

But the linkerd-network-validator init container still fails when I try to mesh a pod:

2025-11-18T13:48:18.183963Z INFO linkerd_network_validator: Listening for connections on 0.0.0.0:4140
2025-11-18T13:48:18.184243Z DEBUG linkerd_network_validator: token=“mytoken\n”
2025-11-18T13:48:18.184258Z INFO linkerd_network_validator: Connecting to 1.1.1.1:20001
2025-11-18T13:48:28.183712Z ERROR linkerd_network_validator: Failed to validate networking configuration. Please ensure traffic redirection rules are rewriting traffic as expected. timeout=10s
stream closed: EOF for horizons/metadata-api-6fd6b666cc-8zw5h (linkerd-network-validator)

@william I’m available to help you guys get this working. It is indeed an alpha version. But I had a bit higher hopes . Nothing much working

zaharidichev · November 19, 2025, 5:26pm

Thanks for the feedback. Going from top to bottom:

We had a problem with our build infra that has been fixed now, so the 2.19.1 Windows image should be updated.
Currently, the DS installer of the Windows CNI has a node selector with kubernetes.io/os:`` windows. Is that not working for you? Why is there a need for tainting?
At the moment, this plugin has been tested on AKS only, so it might have some assumptions embedded about how CNI plugins are chained on Windows nodes.

From the logs you’ve shared, it seems that no error is being thrown, yet the network is not being configured. Could I ask you to please share:

The manifest of the workload that is being injected.
The environment you are trying to run it in.
The CNI plugin should have a log named linkerd2-windows-cni.log in the bin directory of the CNI plugin. Can you verify that this is the case and share its contents?
The configuration of any other CNI plugins that are installed on the node.

Jan · November 20, 2025, 8:36am

Good to hear about the 2.19.1 release
Taints are needed because most Linux applications do not use nodeSelectors. This means that Linux pods are scheduled on Windows nodes. Default nodes in a cluster is Linux with no taints. All pods by default go there. If suddenly there are Windows nodes without taints also then there is chaos and Linux pods are scheduled on Windows nodes.

Common practice for Windows nodes is to have:

taints:
- effect: NoSchedule
key: kubernetes.io/os
value: windows

In addition they have a label:

node.kubernetes.io/windows-build: 10.0.20348

This is needed in case you have node pools with different Windows versions. This is needed when migrating from one Windows version to another

So Windows pods then have a toleration for kubernetes.io/os: windows and a nodeSelector for node.kubernetes.io/windows-build.

A kubernetes.io/os: windows nodeSelector works if you application alsways support all available Windows versions. 2019, 2022 and soon 2025.

We are running on AKS and have a mix of Linux and Windows nodes. Majority is Linux.
Linkerd windows cni logs

PS C:\hpc> cat C:\k\linkerd2-windows-cni.log
time=“2025-11-20T04:07:42Z” level=debug msg=“cmdAdd, config parsed”
time=“2025-11-20T04:07:42Z” level=debug msg=“skipping pod fluentbit/fluent-bit-win22-8dsfc” ns=fluentbit pod=fluent-bit-win22-8dsfc
time=“2025-11-20T04:07:42Z” level=info msg=“plugin is finished” ns=fluentbit pod=fluent-bit-win22-8dsfc
time=“2025-11-20T04:07:49Z” level=debug msg=“cmdAdd, config parsed”
time=“2025-11-20T04:07:49Z” level=debug msg=“skipping pod horizons/tasksystem-executor-ssis-monolith-tasks-7584b475c4-7pzks” ns=horizons pod=tasksystem-executor-ssis-monolith-tasks-7584b475c4-7pzks
time=“2025-11-20T04:07:49Z” level=info msg=“plugin is finished” ns=horizons pod=tasksystem-executor-ssis-monolith-tasks-7584b475c4-7pzks
time=“2025-11-20T06:00:11Z” level=debug msg=“cmdAdd, config parsed”
time=“2025-11-20T06:00:11Z” level=error msg=“error getting pod fluentbit/fluent-bit-win22-p4jgv: &{{{%!e(string=) %!e(string=)} {%!e(string=) %!e(string=) %!e(string=) %!e(*int64=)} %!e(string=Failure) %!e(string=Unauthorized) %!e(v1.StatusReason=Unauthorized) %!e(*v1.StatusDetails=) %!e(int32=401)}}” ns=fluentbit pod=fluent-bit-win22-p4jgv
time=“2025-11-20T06:00:23Z” level=debug msg=“cmdAdd, config parsed”
time=“2025-11-20T06:00:23Z” level=error msg=“error getting pod fluentbit/fluent-bit-win22-p4jgv: &{{{%!e(string=) %!e(string=)} {%!e(string=) %!e(string=) %!e(string=) %!e(*int64=)} %!e(string=Failure) %!e(string=Unauthorized) %!e(v1.StatusReason=Unauthorized) %!e(*v1.StatusDetails=) %!e(int32=401)}}” ns=fluentbit pod=fluent-bit-win22-p4jgv
time=“2025-11-20T06:00:35Z” level=debug msg=“cmdAdd, config parsed”

The rest of the log is filled with the same errors about fluentbit.

Actually looking more on this cluster now I see that the linkerd-windows-cni is causing fluentbit pods to be stuck in ContainerCreating state. With this error:

Warning FailedCreatePodSandBox 3m22s (x568 over 131m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox “ac794de392a18710ced67f7c411efe9b86a88fd4318c135f33e8deeeeef0de7e”: plugin type=“linkerd2-windows-cni” name=“0.4.0” failed (add): Unauthorized\

That is a serious bug. The cni plugin are causing problems for other non-meshed applications. Currently there are no meshed Windows pods on this cluster. Fluentbit is also not intended to be meshed.

We can get back to the injected application manifest after these other things are sorted out. The deployment manifests are very standard with the tolerations and nodeSelector needed to make pods be scheduled on Windows nodes.
The only other cni on the nodes is the standard azure-cns-win.

zaharidichev · November 21, 2025, 2:13pm

Thanks a lot for the detailed response. Lets get this working !

Ok, lets leave taints aside for a bit and sort out the other problems first. It appears to me that this is a RBAC issue where the CNI plugin does not have Kube API permissions to perform a GET on the pod that is being considered.

We ca make things more fault tolerant so it does not mess up the pod creation in this situation. However, the question is still there. Why are there no permissions for the plugin?

The installer is supposed to create a Kube config file for the plugin at: C:\\k\\azurecni\\netconf\linkerd-windows-cni-kubeconfig. Is this file present? How does it look? This file should describe the permissions of the plugin itself. Can we take a look at this file?

Jan · November 21, 2025, 2:37pm

Hi,

The kube.config file is there and looks like this

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: redacted
    server: https://10.255.0.1:443
  name: local
contexts:
- context:
    cluster: local
    user: linkerd-cni
  name: linkerd-cni-context
current-context: linkerd-cni-context
kind: Config
users:
- name: linkerd-cni
  user:
    token: redacted

zaharidichev · November 21, 2025, 3:19pm

Ok can you share the exact environment you are using for this AKS cluster? I will try to reproduce the problem myself. Can you use this kubeconfig to do any operations on the cluster? From your local machine for example? Have you changed anything about the RBAC that has been installed as part of this CNI plugin?

Jan · November 24, 2025, 2:58pm

Not sure what to answer about the AKS setup. Pretty standard with mixed node pools.
I did not change anything related to RBAC in the Linkerd CNI installation

zaharidichev · November 25, 2025, 7:50am

So I followed the tutorial and I was not able to reproduce the problem. I have a suspicion that we have a bug in handling expired service account tokens for the plugin. Can you please validate the following. In your linkerd-windows-cni-kubeconfig file, can you check that the JWT token that is populated in linkerd-cni is not expired? Also, if that is the case, can you rollout restart your cni installer DaemonSet and see whether that fixes the problem. But please, check the expiration of the token first.

Topic		Replies	Views
Error when testing Windows proxy Linkerd General Discussion proxy	4	58	November 5, 2025
Linkerd CNI in AKS fails after Calico pod get restarted Linkerd General Discussion	3	727	October 31, 2023
Helm install with linkerd-cni plugin get error on init container linkerd-network-validator PodInitializing Linkerd General Discussion configuration	1	410	June 5, 2024
Linkerd2-cni: After upgrading to v30.8.5 cni pod crashes and can not be restarted Linkerd General Discussion	4	292	March 6, 2024
2.16.0 Helm chart producing null value for new deployment "linkerd-enterprise" Linkerd General Discussion configuration	3	150	October 14, 2024

Linkerd-enterprise-windows-cni helm chart does not support tolerations

Related topics