Destination failed to send profile update

thradec · December 9, 2024, 2:07pm

Hi all, we are struggling with upgrading linkerd in our cluster (from stable-2.14.1 to edge 24.10.5). The same upgrade procedure works fine on smaller clusters. But this cluster is relatively large (aws, k8s 1.29, 75 nodes, 2k+ pods). The upgrade of controlplane itself finished fine, but once we started restarting the existing workload, to use new proxy, it started degrade quickly into unusable state, the rollback to the original version luckily helped. We find out, that logs were flooded with thousands of messages.

From proxies:

"level": "WARN", "fields": { "message": "Unexpected policy controller response; retrying with a backoff", "grpc.status": "Deadline expired before operation could complete", "grpc.message": "initial item not received within timeout" }, "target": "linkerd_app::dst",

From destination:

"level": "error", "msg": "failed to send profile update: rpc error: code = Canceled desc = context canceled",

The prometheus metrics for cpu/mem consumptions didn’t show any bottleneck. But still, we tried significantly increase resources and number of replicas for the whole linkerd controlplane. Which only delay slightly the issue. We also tried increase resources on individual proxies and increase proxy-inbound-connect-timeout/proxy-outbound-connect-timeout, unfortunately without success.

Thanks in advance for any ideas.

Mahendran · January 2, 2025, 3:13pm

The issue seems related to the control plane struggling under the cluster’s scale. The logs point to timeouts with the policy and destination controllers. Ensure these components have enough resources, and check for network latency or DNS issues. Try restarting workloads in smaller batches and consider upgrading to a stable version first before using the edge release.

Topic		Replies	Views
Linkerd Destination Scaling Issues Linkerd General Discussion	2	654	February 14, 2024
Update using lifecycle automation is slow Linkerd General Discussion configuration	2	41	August 27, 2024
Linkerd destination control plane pod restarts Linkerd General Discussion configuration	1	533	September 27, 2023
Linkerd proxy is causing Redis to fail on upgrade to latest Redis version Linkerd General Discussion proxy	13	299	December 4, 2024
Application errors during Linkerd upgrades Linkerd General Discussion	0	320	April 13, 2023

Destination failed to send profile update

Related topics