ERROR: failed to verify issuer credentials for 'identity.linkerd.cluster.local' with trust anchors: x509: certificate has expired or is not yet valid

We have deployed Linkerd using the helm charts.
For certificates, we are using cert-manager for auto-renewal of certificates. The certificate itself would be valid for 48h but we have set the renewal for 24h. Randomly, in about a month or two there is a warning that comes up saying

{"log":"time=\"2023-12-13T13:40:50Z\" level=warning msg=\"Skipping issuer update as certs could not be read from disk: failed to verify issuer credentials for 'identity.linkerd.cluster.local' with trust anchors: x509: certificate has expired or is not yet valid: current time 2023-12-13T13:40:50Z is after 2023-12-13T13:40:08Z - Current Time : 2023-12-13 13:40:50.279788145 +0000 UTC m=+5023800.297985164 - Invalid before 2023-12-13 13:40:13 +0000 UTC - Invalid After 2023-12-15 13:40:13 +0000 UTC\"","logtag":"F"}

After this none of the linkerd proxy certificates were renewed. All of them failed with the below error

{"log":"time=\"2023-12-13T13:51:06Z\" level=error msg=\"could not process CSR because of CA cert validation failure: x509: certificate has expired or is not yet valid: current time 2023-12-13T13:51:06Z is after 2023-12-13T13:40:08Z - Current Time : 2023-12-13 13:51:06.01223291 +0000 UTC m=+5024416.030429929 - Invalid before 2023-12-12 13:40:13 +0000 UTC - Invalid After 2023-12-14 13:40:13 +0000 UTC - CSR Identity : audit-service.serviceaccount.identity.linkerd.cluster.local\"","logtag":"F"}

And on the proxy containers

{"log":"[5081495.311735s] ERROR ThreadId(02) identity: linkerd_proxy_identity_client::certify: Failed to obtain identity error=status: Unknown, message: \"x509: certificate has expired or is not yet valid: current time 2023-12-13T13:51:06Z is after 2023-12-13T13:40:08Z - Current Time : 2023-12-13 13:51:06.01223291 +0000 UTC m=+5024416.030429929 - Invalid before 2023-12-12 13:40:13 +0000 UTC - Invalid After 2023-12-14 13:40:13 +0000 UTC\", details: [], metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Wed, 13 Dec 2023 13:51:06 GMT\"} }","logtag":"F"}

The only thing that we can do to resolve this is to restart the control plane. But, meanwhile, all our applications are down. This happens only on Prod.
Is there any solution for this, please?

The first error you point out has some contradictory data:

certificate has expired or is not yet valid: current time 2023-12-13T13:40:50Z is after 2023-12-13T13:40:08Z
Invalid After 2023-12-15 13:40:13 +0000 UTC

Can you post the full issuer certificate?

After troubleshooting on our end, we realize the problem with the root certs (trust anchor certificate) being rotated every 60 days by the cert-manager. Since the certs were loaded as volumes the identity pods had to be restarted for the changes to be effective.
We were unable to check this as the issue occurred a month after these certificates were rotated (previous certs being valid for 90 days).
We are currently considering either manually restarting the identity pods every 75 days or using a certificate that’s valid longer. But we would need a proper solution. Any suggestions here would be very helpful.

We seem to have the same problem, except that as we set extremely short rotation circles to be able to check everything in the beginning. The trust anchor certificate has a duration of 24h, the issuer cert 6h. In the beginning, I also set a custom renewBefore in the certificates, but now I’m using the default setting of cert-manager. It worked for ~1 month, and since Dec 28th it continues to break. I restart everything, it works, and the next day I see that it broke again.

Our certificates are auto-renewed by cert-manager, and the the linkerd-identity pods are using a configMap controlled by a Bundle (using the trust-manager) as trust-roots volume.

When it breaks, linkerd check looks like this:

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks contains all node podCIDRs
√ cluster networks contains all pods
√ cluster networks contains all services

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
‼ trust anchors are valid for at least 60 days
    Anchors expiring soon:
        * 26318400229796391699423318841621534888 root.linkerd.cluster.local will expire on 2024-01-09T20:22:44Z
    see https://linkerd.io/2.14/checks/#l5d-identity-trustAnchors-not-expiring-soon for hints
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2024-01-09T14:22:45Z
    see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
‼ proxy-injector cert is valid for at least 60 days
    certificate will expire on 2024-01-09T14:22:45Z
    see https://linkerd.io/2.14/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
√ sp-validator webhook has valid cert
‼ sp-validator cert is valid for at least 60 days
    certificate will expire on 2024-01-09T14:22:45Z
    see https://linkerd.io/2.14/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
√ policy-validator webhook has valid cert
‼ policy-validator cert is valid for at least 60 days
    certificate will expire on 2024-01-09T14:22:52Z
    see https://linkerd.io/2.14/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints

linkerd-version
---------------
√ can determine the latest version
√ cli is up-to-date

control-plane-version
---------------------
√ can retrieve the control plane version
√ control plane is up-to-date
√ control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
√ control plane proxies are up-to-date
√ control plane proxies and cli versions match

linkerd-ha-checks
-----------------
√ pod injection disabled on kube-system

linkerd-jaeger
--------------
√ linkerd-jaeger extension Namespace exists
√ jaeger extension pods are injected
√ jaeger injector pods are running
‼ jaeger extension proxies are healthy
    Some pods do not have the current trust bundle and must be restarted:
        * jaeger-injector-57fc7b6c56-76f74
    see https://linkerd.io/2.14/checks/#l5d-jaeger-proxy-healthy for hints
√ jaeger extension proxies are up-to-date
√ jaeger extension proxies and cli versions match

linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ can initialize the client
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
‼ tap API server cert is valid for at least 60 days
    certificate will expire on 2024-01-09T14:22:50Z
    see https://linkerd.io/2.14/checks/#l5d-tap-cert-not-expiring-soon for hints
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
‼ viz extension proxies are healthy
    Some pods do not have the current trust bundle and must be restarted:
        * metrics-api-787845b69d-k7fsf
        * tap-6f5d8b9448-jsh9z
        * tap-injector-687fcdff96-tkrkh
        * web-647c9c5855-9shtm
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-healthy for hints
√ viz extension proxies are up-to-date
√ viz extension proxies and cli versions match
√ viz extension self-check

Status check results are √

I run it in HA mode, an example log of one of the identity-pods is:

...
...
2024-01-08T16:43:54.880894096Z time="2024-01-08T16:43:54Z" level=info msg="issued certificate for tempo.tracing.serviceaccount.identity.linkerd.cluster.local until 2024-01-08 22:22:45 +0000 UTC: 2aa8e1fcfd4adc285b50ece886a7e057da6126a07c71efca539f49d102befb03"
2024-01-08T16:43:56.835814041Z time="2024-01-08T16:43:56Z" level=info msg="issued certificate for default.opentelemetry-operator-system.serviceaccount.identity.linkerd.cluster.local until 2024-01-08 22:22:45 +0000 UTC: 14b054e98c1a4e3ccbf1eae6510391fb35dd962c3d9930ace6789c497af75baf"
2024-01-08T20:22:54.115974712Z time="2024-01-08T20:22:54Z" level=info msg="Updated identity issuer"
2024-01-08T20:41:04.306370574Z time="2024-01-08T20:41:04Z" level=info msg="issued certificate for linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local until 2024-01-09 02:22:45 +0000 UTC: 7cc39a37029fc8e0372cd002b9bcdea3871fbcdf06564d40552613f6e8ac4e13"
2024-01-08T20:41:04.407563899Z time="2024-01-08T20:41:04Z" level=info msg="issued certificate for linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local until 2024-01-09 02:22:45 +0000 UTC: 6a935eaeffedd2cab9f71c91aaab93eae6b9dcdc57983872d87880b7b409e9a2"
20...
2024-01-08T20:41:06.571006602Z time="2024-01-08T20:41:06Z" level=info msg="issued certificate for default.opentelemetry-operator-system.serviceaccount.identity.linkerd.cluster.local until 2024-01-09 02:22:45 +0000 UTC: 588c673ae9b5c03d818179bf96ba705ab0a69daf12594daf9dd5ddc9da6f0461"
2024-01-09T00:23:36.152488605Z time="2024-01-09T00:23:36Z" level=info msg="Updated identity issuer"
2024-01-09T00:40:14.819432718Z time="2024-01-09T00:40:14Z" level=info msg="issued certificate for linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local until 2024-01-09 06:22:45 +0000 UTC: dfe09fe91401cc81a620454be3ce8dec3194a06e509576699df86a9283f00cde"
2024-01-09T00:40:14.841552376Z time="2024-01-09T00:40:14Z" level=info msg="issued certificate for linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local until 2024-01-09 06:22:45 +0000 UTC: 0f4df6b33dc748b525d6ac2708465495e573913d72ba7c113d581cb29922e6c2"
...
2024-01-09T00:40:15.484241474Z time="2024-01-09T00:40:15Z" level=info msg="issued certificate for default.opentelemetry-operator-system.serviceaccount.identity.linkerd.cluster.local until 2024-01-09 06:22:45 +0000 UTC: 6a1b57377bc9fa447f798bac314626be1d3cdd56543363870cad44e0b5c4f992"
2024-01-09T04:23:06.164455375Z time="2024-01-09T04:23:06Z" level=warning msg="Skipping issuer update as certs could not be read from disk: failed to verify issuer credentials for 'identity.linkerd.cluster.local' with trust anchors: x509: certificate has expired or is not yet valid: current time 2024-01-09T04:23:06Z is after 2024-01-09T04:22:44Z - Current Time : 2024-01-09 04:23:06.163801424 +0000 UTC m=+76664.387505192 - Invalid before 2024-01-09 04:22:45 +0000 UTC - Invalid After 2024-01-09 10:22:45 +0000 UTC"
2024-01-09T04:39:59.952187568Z time="2024-01-09T04:39:59Z" level=error msg="could not process CSR because of CA cert validation failure: x509: certificate has expired or is not yet valid: current time 2024-01-09T04:39:59Z is after 2024-01-09T04:22:44Z - Current Time : 2024-01-09 04:39:59.951553053 +0000 UTC m=+77678.175256790 - Invalid before 2024-01-09 00:22:45 +0000 UTC - Invalid After 2024-01-09 06:22:45 +0000 UTC - CSR Identity : linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local"
2024-01-09T04:40:00.072902481Z time="2024-01-09T04:40:00Z" level=error msg="could not process CSR because of CA cert validation failure: x509: certificate has expired or is not yet valid: current time 2024-01-09T04:40:00Z is after 2024-01-09T04:22:44Z - Current Time : 2024-01-09 04:40:00.072337696 +0000 UTC m=+77678.296041434 - Invalid before 2024-01-09 00:22:45 +0000 UTC - Invalid After 2024-01-09 06:22:45 +0000 UTC - CSR Identity : default.component-sia-platform.serviceaccount.identity.linkerd.cluster.local"
...

The logs of the linkerd-proxy containers look like this:

[ 93518.182957s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.245.203.6:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.245.203.6:8080: invalid peer certificate: Expired error.sources=[invalid peer certificate: Expired]
[ 93518.390518s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_stack::failfast: Service entering failfast after 3s
[ 93518.390623s] ERROR ThreadId(02) identity: linkerd_proxy_identity_client::certify: Failed to obtain identity error=status: Unknown, message: "controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast", details: [], metadata: MetadataMap { headers: {} } error.sources=[controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast, service in fail-fast]
[ 93528.395376s] WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.245.141.170:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.245.141.170:8080: invalid peer certificate: Expired error.sources=[invalid peer certificate: Expired]

Our issuer cert looks like this:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      ...
  creationTimestamp: "2023-10-13T12:08:46Z"
  generation: 2
  managedFields:
  - ...
    manager: cert-manager-certificates-readiness
    operation: Update
    subresource: status
    time: "2024-01-09T08:22:48Z"
  name: linkerd-identity-issuer
  namespace: linkerd
  resourceVersion: "366505900"
  uid: 7b9373c2-a172-4cc5-be1d-660e49629bd2
spec:
  commonName: identity.linkerd.cluster.local
  duration: 6h0m0s
  isCA: true
  issuerRef:
    kind: ClusterIssuer
    name: linkerd-trust-anchor
  privateKey:
    algorithm: ECDSA
  secretName: linkerd-identity-issuer
  usages:
  - cert sign
  - crl sign
  - server auth
  - client auth
status:
  conditions:
  - lastTransitionTime: "2024-01-05T12:22:48Z"
    message: Certificate is up to date and has not expired
    observedGeneration: 2
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2024-01-09T14:22:45Z"
  notBefore: "2024-01-09T08:22:45Z"
  renewalTime: "2024-01-09T12:22:45Z"
  revision: 1033

I would be very grateful for any help, and am happy to give more information.

Should this thread be probably closed with a reference to the discussion on GitHub ERROR: failed to verify issuer credentials for 'identity.linkerd.cluster.local' with trust anchors: x509: certificate has expired or is not yet valid · linkerd/linkerd2 · Discussion #11783 · GitHub?