Proxy liveness probe timeout in Envoy pod on GKE 1.33

Hello,

My GKE cluster was auto-updated from Kubernetes 1.32 to 1.33 this weekend (I was on the rapid channel). After the update, the Envoy ingress pods (managed by Contour, injected with linkerd.io/inject : ingress ) started flip-flopping between Ready and not. Turns out the Linkerd proxy’s liveness probe (http://:4191/live ) was failing with context deadline exceeded (Client.Timeout exceeded while awaiting headers) .

Removing the Linkerd proxies from the Envoy pods (linkerd.io/inject : disabled ) fixed the issue.

I was still on Linkerd enterprise-2.15.4 , so I updated to 2.18.1 . I re-enabled proxy injection in Envoy, and at first it was working but after a few hours I started seeing the same behaviour again (liveness probe failing intermittently).

So I disabled proxy injection again and everything is working for now, but I’d like to get to the bottom of this and ideally be able to re-enable proxy injection in my Envoy pods.

According to this, the maximum supported Kubernetes version is 1.32. Does that mean that there are known issues with 1.33, or simply that it hasn’t been tested with 1.33?

Thanks for letting us know about this. I don’t have any immediate leads but I’ll bring it up with the team.

The latter. I’m pretty sure I’ve run Linkerd on 1.33 locally, but we haven’t yet updated CI.

I’ve definitely run Linkerd on Kubernetes 1.33 locally with k3d, for the record.

(I do get some complaints when installing Linkerd on 1.33:

service/linkerd-policy-validator created
Warning: spec.template.spec.containers[2].ports[1]: duplicate port name “admin-http” with spec.template.spec.containers[1].ports[1], services and probes that select ports by name will use spec.template.spec.containers[1].ports[1]
Warning: spec.template.spec.containers[3].ports[0]: duplicate port name “grpc” with spec.template.spec.containers[1].ports[0], services and probes that select ports by name will use spec.template.spec.containers[1].ports[0]
Warning: spec.template.spec.containers[3].ports[1]: duplicate port name “admin-http” with spec.template.spec.containers[1].ports[1], services and probes that select ports by name will use spec.template.spec.containers[1].ports[1]

but things seem to work for me.

Thanks for the replies. I realize that I neglected to grab the proxy logs when I was in the problematic situation, so during the next low-usage period I’ll re-enable the proxies in Envoy and grab the proxy logs. Hopefully that will help figure out what’s going on.

Meanwhile, in case it can be useful, here’s the output of kubectl describe on one of the Envoy pods while it was in the problematic situation:

Name:             envoy-[REDACTED]
Namespace:        projectcontour
Priority:         0
Service Account:  envoy
Node:             [REDACTED]/[REDACTED]
Start Time:       [REDACTED]
Labels:           app=envoy
                  controller-revision-hash=5d7f5d8b6f
                  linkerd.io/control-plane-ns=linkerd
                  linkerd.io/proxy-daemonset=envoy
                  linkerd.io/workload-ns=projectcontour
                  pod-template-generation=74
Annotations:      config.linkerd.io/skip-outbound-ports: 8001
                  kubectl.kubernetes.io/restartedAt: [REDACTED]
                  linkerd.io/created-by: linkerd/proxy-injector enterprise-2.15.4
                  linkerd.io/inject: ingress
                  linkerd.io/proxy-version: enterprise-2.15.4
                  linkerd.io/trust-root-sha256: [REDACTED]
                  viz.linkerd.io/tap-enabled: true
Status:           Running
IP:               [REDACTED]
IPs:
  IP:           [REDACTED]
Controlled By:  DaemonSet/envoy
Init Containers:
  envoy-initconfig:
    Container ID:  containerd://[REDACTED]
    Image:         ghcr.io/projectcontour/contour:v1.32.0
    Image ID:      ghcr.io/projectcontour/contour@sha256:ec225b3a964cc4e5d713dfe4ac912f1e2b806d2bac56c474bc04819f67b80cb0
    Port:          <none>
    Host Port:     <none>
    Command:
      contour
    Args:
      bootstrap
      /config/envoy.json
      --xds-address=contour
      --xds-port=8001
      --xds-resource-version=v3
      --resources-dir=/config/resources
      --envoy-cafile=/certs/ca.crt
      --envoy-cert-file=/certs/tls.crt
      --envoy-key-file=/certs/tls.key
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      [REDACTED]
      Finished:     [REDACTED]
    Ready:          True
    Restart Count:  0
    Environment:
      CONTOUR_NAMESPACE:  projectcontour (v1:metadata.namespace)
    Mounts:
      /certs from envoycert (ro)
      /config from envoy-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-[REDACTED] (ro)
  linkerd-init:
    Container ID:    containerd://[REDACTED]
    Image:           ghcr.io/buoyantio/proxy-init:enterprise-2.15.4
    Image ID:        ghcr.io/buoyantio/proxy-init@sha256:335f6637ca8dfbf1fd3f30561686ca67b31119341fc0cea5cc995269ec2692f8
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Args:
      --incoming-proxy-port
      4143
      --outgoing-proxy-port
      4140
      --proxy-uid
      2102
      --inbound-ports-to-ignore
      4190,4191,4567,4568
      --outbound-ports-to-ignore
      8001
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      [REDACTED]
      Finished:     [REDACTED]
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  20Mi
    Requests:
      cpu:        100m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /run from linkerd-proxy-init-xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-[REDACTED] (ro)
Containers:
  linkerd-proxy:
    Container ID:    containerd://[REDACTED]
    Image:           ghcr.io/buoyantio/proxy:enterprise-2.15.4
    Image ID:        ghcr.io/buoyantio/proxy@sha256:ac96e9c1cbe0363bb977f275f21fb29a34548609b2d09c9a21fbd46a7d13cc81
    Ports:           4143/TCP, 4191/TCP
    Host Ports:      0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    State:           Running
      Started:       [REDACTED]
    Ready:           True
    Restart Count:   0
    Limits:
      memory:  200Mi
    Requests:
      cpu:      10m
      memory:   20Mi
    Liveness:   http-get http://:4191/live delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:4191/ready delay=2s timeout=1s period=10s #success=1 #failure=3
    Environment:
      _pod_name:                                                envoy-[REDACTED] (v1:metadata.name)
      _pod_ns:                                                  projectcontour (v1:metadata.namespace)
      _pod_nodeName:                                             (v1:spec.nodeName)
      LINKERD2_PROXY_LOG:                                       warn,trust_dns=error
      LINKERD2_PROXY_LOG_FORMAT:                                plain
      LINKERD2_PROXY_DESTINATION_SVC_ADDR:                      linkerd-dst-headless.linkerd.svc.cluster.local.:8086
      LINKERD2_PROXY_DESTINATION_PROFILE_NETWORKS:              10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
      LINKERD2_PROXY_POLICY_SVC_ADDR:                           linkerd-policy.linkerd.svc.cluster.local.:8090
      LINKERD2_PROXY_POLICY_WORKLOAD:                           {"ns":"$(_pod_ns)", "pod":"$(_pod_name)"}

      LINKERD2_PROXY_INBOUND_DEFAULT_POLICY:                    all-unauthenticated
      LINKERD2_PROXY_POLICY_CLUSTER_NETWORKS:                   10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
      LINKERD2_PROXY_CONTROL_STREAM_INITIAL_TIMEOUT:            3s
      LINKERD2_PROXY_CONTROL_STREAM_IDLE_TIMEOUT:               5m
      LINKERD2_PROXY_CONTROL_STREAM_LIFETIME:                   1h
      LINKERD2_PROXY_INBOUND_CONNECT_TIMEOUT:                   100ms
      LINKERD2_PROXY_OUTBOUND_CONNECT_TIMEOUT:                  1000ms
      LINKERD2_PROXY_OUTBOUND_DISCOVERY_IDLE_TIMEOUT:           5s
      LINKERD2_PROXY_INBOUND_DISCOVERY_IDLE_TIMEOUT:            90s
      LINKERD2_PROXY_CONTROL_LISTEN_ADDR:                       0.0.0.0:4190
      LINKERD2_PROXY_ADMIN_LISTEN_ADDR:                         0.0.0.0:4191
      LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR:                      127.0.0.1:4140
      LINKERD2_PROXY_INBOUND_LISTEN_ADDR:                       0.0.0.0:4143
      LINKERD2_PROXY_INBOUND_IPS:                                (v1:status.podIPs)
      LINKERD2_PROXY_INBOUND_PORTS:                             8002,8080,8443
      LINKERD2_PROXY_INGRESS_MODE:                              true
      LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES:              svc.cluster.local.
      LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE:                  10000ms
      LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE:                10000ms
      LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION:  25,587,3306,4444,5432,6379,9300,11211
      LINKERD2_PROXY_DESTINATION_CONTEXT:                       {"ns":"$(_pod_ns)", "nodeName":"$(_pod_nodeName)", "pod":"$(_pod_name)"}

      _pod_sa:                                                   (v1:spec.serviceAccountName)
      _l5d_ns:                                                  linkerd
      _l5d_trustdomain:                                         cluster.local
      LINKERD2_PROXY_IDENTITY_DIR:                              /var/run/linkerd/identity/end-entity
      LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS:                    -----BEGIN CERTIFICATE-----
                                                                [REDACTED CERTIFICATE DATA]
                                                                -----END CERTIFICATE-----

      LINKERD2_PROXY_IDENTITY_TOKEN_FILE:                       /var/run/secrets/tokens/linkerd-identity-token
      LINKERD2_PROXY_IDENTITY_SVC_ADDR:                         linkerd-identity-headless.linkerd.svc.cluster.local.:8080
      LINKERD2_PROXY_IDENTITY_LOCAL_NAME:                       $(_pod_sa).$(_pod_ns).serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_IDENTITY_SVC_NAME:                         linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_DESTINATION_SVC_NAME:                      linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_POLICY_SVC_NAME:                           linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_TAP_SVC_NAME:                              tap.linkerd-viz.serviceaccount.identity.linkerd.cluster.local
    Mounts:
      /var/run/linkerd/identity/end-entity from linkerd-identity-end-entity (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-[REDACTED] (ro)
      /var/run/secrets/tokens from linkerd-identity-token (rw)
  shutdown-manager:
    Container ID:  containerd://[REDACTED]
    Image:         ghcr.io/projectcontour/contour:v1.32.0
    Image ID:      ghcr.io/projectcontour/contour@sha256:ec225b3a964cc4e5d713dfe4ac912f1e2b806d2bac56c474bc04819f67b80cb0
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/contour
    Args:
      envoy
      shutdown-manager
    State:          Running
      Started:      [REDACTED]
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  100M
    Requests:
      cpu:        5m
      memory:     100M
    Environment:  <none>
    Mounts:
      /admin from envoy-admin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-[REDACTED] (ro)
  envoy:
    Container ID:  containerd://[REDACTED]
    Image:         docker.io/envoyproxy/envoy:v1.34.1
    Image ID:      docker.io/envoyproxy/envoy@sha256:007da57c2c328a90bd4e6d99b70bd899132f1b4a9426ccafe25437cf84a60c14
    Ports:         8080/TCP, 8443/TCP, 8002/TCP
    Host Ports:    80/TCP, 443/TCP, 8002/TCP
    Command:
      envoy
    Args:
      -c
      /config/envoy.json
      --service-cluster $(CONTOUR_NAMESPACE)
      --service-node $(ENVOY_POD_NAME)
      --log-level warn
    State:          Running
      Started:      [REDACTED]
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  1G
    Requests:
      cpu:      500m
      memory:   1G
    Readiness:  http-get http://:8002/ready delay=3s timeout=1s period=4s #success=1 #failure=3
    Environment:
      CONTOUR_NAMESPACE:  projectcontour (v1:metadata.namespace)
      ENVOY_POD_NAME:     envoy-[REDACTED] (v1:metadata.name)
    Mounts:
      /admin from envoy-admin (rw)
      /certs from envoycert (ro)
      /config from envoy-config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-[REDACTED] (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  envoy-admin:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  envoy-config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  envoycert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  envoycert
    Optional:    false
  kube-api-access-[REDACTED]:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    Optional:                false
    DownwardAPI:             true
  linkerd-proxy-init-xtables-lock:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  linkerd-identity-end-entity:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  linkerd-identity-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  16m                   default-scheduler  Successfully assigned projectcontour/envoy-[REDACTED] to [REDACTED]
  Normal   Pulling    16m                   kubelet            Pulling image "ghcr.io/projectcontour/contour:v1.32.0"
  Normal   Pulled     16m                   kubelet            Successfully pulled image "ghcr.io/projectcontour/contour:v1.32.0" in 879ms (879ms including waiting). Image size: 17656409 bytes.
  Normal   Created    16m                   kubelet            Created container: envoy-initconfig
  Normal   Started    16m                   kubelet            Started container envoy-initconfig
  Normal   Pulled     16m                   kubelet            Container image "ghcr.io/buoyantio/proxy-init:enterprise-2.15.4" already present on machine
  Normal   Created    16m                   kubelet            Created container: linkerd-init
  Normal   Started    16m                   kubelet            Started container linkerd-init
  Normal   Pulled     16m                   kubelet            Container image "ghcr.io/buoyantio/proxy:enterprise-2.15.4" already present on machine
  Normal   Created    16m                   kubelet            Created container: linkerd-proxy
  Normal   Started    16m                   kubelet            Started container linkerd-proxy
  Normal   Pulled     16m                   kubelet            Container image "ghcr.io/projectcontour/contour:v1.32.0" already present on machine
  Normal   Created    16m                   kubelet            Created container: shutdown-manager
  Normal   Started    16m                   kubelet            Started container shutdown-manager
  Normal   Pulled     16m                   kubelet            Container image "docker.io/envoyproxy/envoy:v1.34.1" already present on machine
  Normal   Created    16m                   kubelet            Created container: envoy
  Normal   Started    16m                   kubelet            Started container envoy
  Warning  Unhealthy  9m52s                 kubelet            Readiness probe failed: Get "http://[REDACTED]:4191/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  9m48s (x3 over 16m)   kubelet            Liveness probe failed: Get "http://[REDACTED]:4191/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  5m50s (x23 over 16m)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 502
  Warning  Unhealthy  79s (x26 over 16m)    kubelet            Readiness probe failed: Get "http://[REDACTED]:8002/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

These warnings have been addressed in the latest edge.