This morning I have noticed we have been having some issues causing pod restarts. After digging into it I think it is related to linkerd pods getting evicted because the prometheus pod in that namespace is running very high. It has gotten into a restart loop where I had to delete the WAL files to get the pod to start. Its currently running with memory between 3.5 and 4.5 GB which is a LOT higher then any other prometheus install on my cluster. It is easily the single biggest consumer of memory of any pod by a factor of 4.
What can I do to reduce the amount of memory that this pod consumes?
Yeah, the linkerd-viz Prometheus holds all the trace data in memory; it is definitely not appropriate for production usage. Check out Bringing your own Prometheus | Linkerd for the guide on replacing it with a Prometheus that you manage (which, of course, could be one you already have running).