We are running Linkerd enterprise-2.16.1.
In our clusters we have Redis version 7.2.4 running in with:
architecture: replication
sentinel:
enabled: true
We use this helm chart:
This has been working fine. All Redis pods are meshed and I see that the proxy-injector is also adding this annotation: config.linkerd.io/opaque-ports: "6379". We probably should add the sentinel port 26379 there also, but it has not been a problem
Today I tried to upgrade Redis to the version 7.4.1 with Helm chart version 20.2.1
After the upgrade there is a lot of connection issues. Redis pods can not connect to each other anymore. And applications can not connect to Redis.
I see these type of errors in the Linkerd proxy log:
linkerd-proxy [ 200.305512s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:46842 server.addr=10.241.73.252:6379
linkerd-proxy [ 200.821799s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:52114 server.addr=10.241.73.252:26379
If I disable Linkerd proxy for Redis pods then everything works fine. But, as we are using Linkerd we want to run all the pods in our application namespace in the service mesh.
I need some help figuring out what is going on and how to fix it.
Adding this annotation to Redis pods also makes Redis work: config.linkerd.io/skip-inbound-ports: “6379,26379”
Not really a solution since we want to run all pods in the service mesh with mTLS
Redis pods will now be flooded with INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1
errors in the proxy logs
And redis pods can not connect to each other @Flynn@william I would be grateful is someone could take a look at this issue
I was wrong. It does not work with this annotation: config.linkerd.io/skip-inbound-ports: “6379,26379”
only way to make latest version of redis work is to disable Linkerd proxy
Mmm. Had one occasion where the skip-inbound-ports annotation dod not help as a workaround. Now it works again. But this is equivalent to disabling Linkerd for Redis, so not really a solution
The errors are from the Redis proxies. As you may know the Redis pods connect to each other both for Sentinel and replication. From what I heard when things started failing, other apps using Redis had connection issues
But to investigate it is simplest to just deploy Redis with the configuration above and it will start failing right away. Works fine without Linkerd. Also works with the old version of Redis and Linkerd enabled, which was our configuration before we attempted upgrade of Redis
Hi Jan, sounds like we were able to reproduce this locally, so good news it’s not just you. If you have any ability to try different Redis versions to help us pinpoint the exact change that induced this behavior, that would be helpful. Otherwise we’ll plug away and let you know what you find.
Hi,
I’ve some more info for you to maybe help you track down the issue:
Working versions of the Bitnami redis helm chart is:
18.6.4 → redis:7.2.4-debian-11-r0
18.7.1 → redis:7.2.4-debian-11-r2
Not working version:
18.11.1 → redis:7.2.4-debian-11-r4. # Linkerd proxy failing to get into ready state and eventually crash
18.12.1 → redis:7.2.4-debian-11-r5 # Linkerd proxy starting but blocking traffic so redis is not working. Same as the latest helm chart and redis version
Hi @Jan , we just ran into this as well. For us, we managed to resolved it by disabling the NetworkPolicy, which the bitnami helm chart has silently enabled. Maybe this will also work for you?
Hi @Andrew. This fixed the issue for us also. Completely missed that they had added a NetworkPolicy. Thank you very much for letting me know. I’m fine with closing this issue now