Linkerd destination pod unable to start when using ArgoCD since Linkerd v2.13.0

We’ve recently upgraded to Linkerd v2.13.0, and we’re seeing the following warning in Argo (we use GitOps, Linkerd is deployed with helm):

This seems to cause the following error(s) in the linkerd-destination pod:

Error: failed to get lease: ApiError: leases.coordination.k8s.io "policy-controller-write" not found: NotFound (ErrorResponse { status: "Failure", message: "leases.coordination.k8s.io \"policy-controller-write\" not found", reason: "NotFound", code: 404 })


Caused by:
 0: ApiError: leases.coordination.k8s.io "policy-controller-write" not found: NotFound (ErrorResponse { status: "Failure", message: "leases.coordination.k8s.io \"policy-controller-write\" not found", reason: "NotFound", code: 404 })
 1: leases.coordination.k8s.io "policy-controller-write" not found: NotFound

Is there any way to fix this within Linkerd, or is it possible to fix this in ArgoCD?

If more information is needed, please let me know and I will provide it :slight_smile:

Hey :wave:

In 2.13, we introduced leader election for the policy-controller. The policy-controller now updates resource statuses (HttpRoute resources may contain parentRefs and backendRefs, the status indicates whether the references are valid). Only one controller may write/patch resources. Leader election is implemented using Lease objects.

It seems that Argo excludes the resource. Looking at their documentation, it seems that some groups are excluded by default (e.g events.k8s.io). I would perhaps double check that’s not the case for the coordination group.

In the documentation I linked, Argo also seems to support group inclusion, so you could try to explicitly include the group and see if on a subsequent deploy the policy-controller-write resource is created.

2 Likes

Hi Matei!

Thanks for clarifying the usage of the Lease object.

I found that all coordination.k8s.io objects are excluded by default by Argo, just like events.k8s.io which you already mentioned. I believe that you can’t change this default behavior (yet), by adding it to the inclusions for example.

I also found that there is an open issue in the Argo GitHub repo which describes this issue. Lots of Linkerd users over there. :slight_smile:

We are facing the same issue. Had to revert back to 2.12.4

We’ll see what we can do on our side. This seems to be a problem with Argo hardcoding exclusion groups though. As a workaround, you can manually create the lease resource.

We have a lot of k8s environments. That is one reason for using Argo. Manual steps do not really work. This was my final test before pushing Linkerd out to production environments. It has been working fine on test/staging environments so far. But had to make sure major upgrades of Linkerd also works.
We are on k8s 1.24.6 mostly and Argo 2.5.15. Hopefully we can find a way to make this deployment smooth also

I think this might be a behavioral difference or bug with the linkerd components. I run a few other apps that use kubernetes lease resources and they are deployed with Argocd without any manual steps. The other apps do not appear to have lease manifests applied to the cluster when installing them. Instead, it appears the apps themselves are creating the lease resource in the cluster and updating it. Since the app creates the lease object itself at run time, Argocd does not need to manage the object.

One of the apps that uses leases and does not have any issues being deployed with Argocd is cert-manager. Looking at the code, it appears cert-manager both creates the lease object if it does not exist and manages it automatically. The helm chart for cert-manager has an rbac role that allows the service account to create, get, update, and patch lease resources. Looking at the code for the component that uses leases, it calls leaderelection.NewLeaderElector(). This appears to create the lease object and refresh it. Cert-manager is using a common kubernetes go module, k8s.io/client-go/tools/leaderelection, that helps manage leases. The common go module also has a good example on how to use it.

1 Like

@derektamsen I was just about to post the same thoughts :slight_smile: The lease objects are really runtime objects and should not be part of the Helm chart. Sounds doable for Linkerd to fix this issue

1 Like

Is there a Linkerd GitHub issue for tracking this?

Not yet. Could you open one?

Created an issue here:

2 Likes

There was an issue for this already:

Current workaround: manual deployment of the lease resource.

There was an issue for this indeed, but that is in the Argo repository :slight_smile:
I created an issue in the Linkerd repo, because I think that this issue can be fixed by Linkerd instead of by Argo.

That’s right,. My bad, I totally lost track between few of these issues.

@Linky Can you create a Linkerd issue with the information from your initial support post? The rest of us on this thread can then add our 2 cents to that issue. This is imho a Linkerd issue and not a ArgoCD issue

Oh, sorry. I see there is an issue there already

Hey everyone,

Thanks for bringing this to our attention. We’re currently reviewing options for improving interoperability between ArgoCD and 2.13’s introduction of the Lease resource. We’ll have an update on the plan very soon.

Best,

Eric Anderson
VP, Engineering
Buoyant, Inc. - The creators of Linkerd

3 Likes

Edge release 23.4.3 addresses this issue. It will be promoted to 2.13.3 in the coming 1-2 weeks.

Best,

Eric Anderson
VP, Engineering
Buoyant, Inc. - The creators of Linkerd

3 Likes