Issues with prometheus getting evicted due to memory

This morning I have noticed we have been having some issues causing pod restarts. After digging into it I think it is related to linkerd pods getting evicted because the prometheus pod in that namespace is running very high. It has gotten into a restart loop where I had to delete the WAL files to get the pod to start. Its currently running with memory between 3.5 and 4.5 GB which is a LOT higher then any other prometheus install on my cluster. It is easily the single biggest consumer of memory of any pod by a factor of 4.

What can I do to reduce the amount of memory that this pod consumes?

For context, is this a Prometheus from linkerd-viz or one you manage?

This specifically is from linkerd-viz. We have others that we manage and we don’t see this issue with them.

Yeah, the linkerd-viz Prometheus holds all the trace data in memory; it is definitely not appropriate for production usage. Check out Bringing your own Prometheus | Linkerd for the guide on replacing it with a Prometheus that you manage (which, of course, could be one you already have running).

Ah.. and we did just scale up our infra yesterday. That kinda makes sense.