High Linkerd TCP Connection and timeout

johri21 · June 20, 2023, 10:07pm

We use Linkerd as a service mesh in production, last evening one of our applications started getting a timeout. This app was called from one of our services and it used Linkerd URL for communication between the two. Service A called service B which was down using Linkerd’s internal URL

While investigating we found out that the number of open connection were too high and it plateaued.

In order to resolve it we restarted the app pod with high TCP open connections and things started to work.

Now trying to figure out what happened as there were no changes that happened to the system

Questions I am dabbling with:

Why did the TCP open connections suddenly spike

Screenshot 2023-06-18 at 11.54.53 PM968×1672 85.1 KB

so much? It’s like 90 degrees?
Why did it plateau around 500, why can’t we have more open connections?

Alex · June 26, 2023, 7:27pm

Hi @johri21

Without more information about the system or reproduction steps, it’s hard to say why this might have happened. One thing that I’d recommend is looking at the proxy metrics (you can use the linkerd diagnostics proxy-metrics command) to figure out exactly where the connection spike is. e.g. is it between the client and it’s sidecar proxy? or between the two proxies? or between the server and it’s sidecar proxy?

Topic		Replies	Views
Thousands of TCP Connections Remain Established Linkerd General Discussion	0	35	December 4, 2024
503 Service Unavailable from proxy with large number of connections Linkerd General Discussion	3	1341	June 13, 2023
Could Someone Give me Guidance on Optimizing Linkerd for High-Traffic Microservices? Linkerd General Discussion	1	100	August 22, 2024
Adding linkerd proxy creates too many connection errors Linkerd General Discussion proxy	0	545	October 29, 2023
Linkerd_app_core::serve: Connection closed error=TLS Linkerd General Discussion ingress	1	209	July 10, 2024

High Linkerd TCP Connection and timeout

Related topics