Potential (slow) memory leak for linkerd-proxy sidecar for single workload on 2.13.3

neal · June 1, 2023, 5:03pm

Hello! I’ve noticed that we have a single deployment for which the linkerd-proxy container appears to have a slow memory leak. It currently has a memory limit of 500Mi for the container, and will take over a week to hit that limit, but memory usage for it really stands out among all of our other workloads.

I know this is probably nearly impossible to diagnose with the information I’ve given, but are there any properties of a workload that might interact with linkerd to precipitate something like this? As far as I know, we’re not having any operational problems with the workload. It’s meshed, subject to authz policies, and working fine.

I’ve attached a screenshot that shows memory usage for the linkerd-proxy container for all of our deployments. You can see that only a single one (out of dozens) shows this pattern.

william · June 2, 2023, 1:43pm

Which version is the proxy?

Edit: never mind, just saw that in the title. Interesting. Can you connect your deployment to Buoyant Cloud and send us a diagnostic bundle? That would be the quickest way to dig into this.

neal · June 2, 2023, 11:54pm

I pressed the button to send diagnostics. I was told I’d get an email, but I’m not sure that ever happened.

william · June 13, 2023, 10:42pm

Neal, I just wanted to follow up on this. Did you ever get an email with the diagnostic ID in it?

neal · June 15, 2023, 5:16pm

Hello! I don’t think I ever got an email.

william · June 15, 2023, 6:48pm

Interesting. I looked through the send logs and also don’t see anything. Can you try again? And note the timestamp?

neal · June 16, 2023, 12:27am

I just tried again at 5:25pm PDT. I didn’t receive a timestamp form Buoyant or anything, in case that’s what you mean by timestamp.

william · July 1, 2023, 2:25pm

Neal, I dropped the ball on replying to this. I trawled our outbound email logs and also don’t see any email sent to you at that time. Not sure why that would be. I will file an internal ticket for us to investigate. Sorry about that!

In terms of that one workload, can you describe the nature of the traffic a bit? What does it do that’s different from the other meshed workloads?

neal · July 3, 2023, 11:08pm

The workload takes requests which are large json blobs (mostly base64 encoded images), and returns json responses which are scan results of those images. We have other workloads which are similar, and not experiencing a memory leak. I’m not sure what it is about this workload that distinguishes it from other meshed workloads.

At this point, we at least have a work-around: the ol’ reboot it daily. (The leak would probably take about 4-5 days to cause an OOM, so this is more aggressive than is necessary, but fine.)

Topic		Replies	Views
Linkerd Proxy memory usage increase & OOM when app response with ~5MB payload over ~12 requests/sec Linkerd General Discussion proxy	1	1268	July 24, 2023
Could Someone Give me Guidance on Optimizing Linkerd for High-Traffic Microservices? Linkerd General Discussion	1	90	August 22, 2024
Welcome to the Buoyant Linkerd Support Forum! Linkerd General Discussion	0	692	March 29, 2023
Linkerd Proxy Restart Causes Brief Traffic Disruptions Linkerd General Discussion	0	37	February 21, 2025
I want Help Setting Up a Linkerd Service Mesh Linkerd General Discussion	1	62	January 25, 2025

Potential (slow) memory leak for linkerd-proxy sidecar for single workload on 2.13.3

Related topics