Memory leak issue on AWS EKS Node with aarch64 architecture after Edge-23.2.3

Hi all,

We use Linkerd for gRPC load balancing on AWS EKS. After upgrading to Linkerd stable-2.13.0, our k8s services sometimes encountered gRPC 14 UNAVAILABLE: connections to all backends failing errors.

We discovered it was because of the memory leak issue on the system.mem.slab memory of AWS EKS nodes, and seems only happened on AWS nodes with arrch64 (arm64) CPU.

And we confirmed this issue started happening since Edge-23.2.3, we suspect this issue was caused by bumping the version of slab dependency to 0.4.8 (PR)

  • Linkerd version: Edge-23.2.3
  • Platform
    • Kernel Name: Linux
    • Kernel Release: 5.10.167-147.601.amzn2.aarch64
    • Kernel Version: #1 SMP Tue Feb 14 21:50:23 UTC 2023
    • Processor: aarch64

Would it be possible to request your assistance in checking on this matter? I sincerely appreciate your help.

This kind of thing is hard to investigate from the outside, but as a starting point could you try a more recent edge release, or 2.13.3?

Thanks for the suggestion, I will try the latest edge release Edge-23.5.3 and update the result here.

Hi @william we tried the latest stable release 2.13.4 and seems the memory leak issue for AWS nodes with aarch64 (ARM64) CPU still exists.

Sorry for the delay. I think this warrants filing a GitHub issue. Please provide as much information as you can about how to reproduce this… I am not sure if we have access to an aarch64 AWS cluster but if you have a minimalist repro case that will be helpful.