Clarification on high number of pending outbound http balancer endpoints and metric definition

Question about the metric outbound_http_balancer_endpoints. My current understanding of the metric is that it lists the total number of backend endpoints that are in either a pending or ready state (Proxy Metrics | Linkerd). So, it seems like the total number of endpoints (ready and pending ) in the balancer for a given pair of (backend_name, pod) should never exceed the total number of endpoints (presumably deduped with endpointslices) assigned to the backing object (for example, the sum of all endpoints (ready or pending ) in the balancer for a service should be the sum of all the existing endpoints for that service — that is sum (backend_name, pod) (outbound_http_balancer_endpoints{backend_name="api"}) should equal (# of service addresses) * (# of service ports) in the cluster (from kubectl get endpoints api )).

With that in mind, currently, we have some pods with outbound_http_balancer_endpoints metrics that have more than 300 endpoints in pending states, which does not seem possible given that the number of endpoints for the backend_name (in this case pretty normal services and trafficsplits) is less than 10.

Am I understanding the metric correctly? If so, is there a bug with removing pending backends (currently on linkerd 2.13.4) from the balancer in certain conditions? If not, what is the expected cap of outbound_http_balancer_endpoints?

Some additional information:

  • I ran linkerd diagnostics endpoints api.team.svc.cluster.local against the cluster and only got 4 endpoints (for the 4 existing pods), while the linkerd proxies that are sending traffic to these services report hundreds of pending endpoints (though maybe diagnostics only prints ready endpoints?).
  • I ran the diagnostics command against each of our destination pods, and they all agreed with each other.
  • I double checked the pods that had the high pending endpoints directly and confirmed that this is not a metrics issue:
# For pod A
outbound_http_balancer_endpoints{parent_group="core",parent_kind="Service",parent_namespace="team",parent_name="api-v2",parent_port="8080",parent_section_name="",backend_
group="core",backend_kind="Service",backend_namespace="team",backend_name="api-v2-stable",backend_port="8080",backend_section_name="",endpoint_state="pending"} 81
# For pod B
outbound_http_balancer_endpoints{parent_group="core",parent_kind="Service",parent_namespace="team",parent_name="api-v2",parent_port="8080",parent_section_name="",backend_group="core",backend_kind="Service",backend_namespace="team",backend_name="api-v2-stable",backend_port="8080",backend_section_name="",endpoint_state="pending"} 81
# For pod C
outbound_http_balancer_endpoints{parent_group="core",parent_kind="Service",parent_namespace="team",parent_name="api-v2",parent_port="8080",parent_section_name="",backend_group="core",backend_kind="Service",backend_namespace="team",backend_name="api-v2-stable",backend_port="8080",backend_section_name="",endpoint_state="pending"} 81

Some more information about the service setup itself:

  • The client linkerd-proxy is meshing an nginx proxy. The upstream service (api-v2) is in a trafficsplit that has a stable and canary split. The stable service is the one that has the highest number of pending endpoints, and is the one that has its service selectors switched quite frequently (as we change the set of pods that has been promoted from canary to stable, without restarting them).

Let me know if you need more information here! Just super curious on why we are seeing such a high number of pending endpoints that shouldn’t exist (given my current understanding).

Hi @jandersen,

Sorry for our delayed response to this well-written question. That does indeed look like unexpected behavior.

it seems like the total number of endpoints (ready and pending ) in the balancer for a given pair of (backend_name, pod) should never exceed the total number of endpoints

While this is generally true, it’s not exactly true: Since ready and pending are instantaneous gauges that are not updated atomically, we frequently see them fluctuate around (both lesser and greater than) the total number of endpoints.

This doesn’t really change anything about the behavior you’ve reported, where you’re seeing 81 endpoints when only 4 exist in the service. That likely indicates a bug.

The stable service is the one that has the highest number of pending endpoints, and is the one that has its service selectors switched quite frequently (as we change the set of pods that has been promoted from canary to stable, without restarting them).

Are you able to provide more detail about this process of label selector changes? I’d like to work towards reproducing this bug by triggering similar changes. Are you simply changing the labelSelector on the Service resource? Are pod labels changing at all? Etc…

No worries on the delayed response – you guys are quite busy.

Thank you for the clarification on the endpoints! That makes sense and fits in with my intuition here – locking for updating the metric might be too expensive anyways.

To clarify on the label selector changes for the services:

The pod labels should not be changing as far as I know – the label selector that changes "rollouts-pod-template-hash" should be unique to the pod template hash generated by the replicaset.

It has been a while since I have spun up a test cluster to check this metric // changes, so if we want to continue down the practical debugging path I am more than happy to do so, but it may take me a bit to get back to you.

1 Like