We are running Linkerd enterprise-2.16.1.
In our clusters we have Redis version 7.2.4 running in with:
architecture: replication
sentinel:
enabled: true
We use this helm chart:
This has been working fine. All Redis pods are meshed and I see that the proxy-injector is also adding this annotation: config.linkerd.io/opaque-ports: "6379". We probably should add the sentinel port 26379 there also, but it has not been a problem
Today I tried to upgrade Redis to the version 7.4.1 with Helm chart version 20.2.1
After the upgrade there is a lot of connection issues. Redis pods can not connect to each other anymore. And applications can not connect to Redis.
I see these type of errors in the Linkerd proxy log:
linkerd-proxy [ 200.305512s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:46842 server.addr=10.241.73.252:6379
linkerd-proxy [ 200.821799s] INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:52114 server.addr=10.241.73.252:26379
If I disable Linkerd proxy for Redis pods then everything works fine. But, as we are using Linkerd we want to run all the pods in our application namespace in the service mesh.
I need some help figuring out what is going on and how to fix it.
Adding this annotation to Redis pods also makes Redis work: config.linkerd.io/skip-inbound-ports: “6379,26379”
Not really a solution since we want to run all pods in the service mesh with mTLS
Redis pods will now be flooded with INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1
errors in the proxy logs
And redis pods can not connect to each other @Flynn@william I would be grateful is someone could take a look at this issue
I was wrong. It does not work with this annotation: config.linkerd.io/skip-inbound-ports: “6379,26379”
only way to make latest version of redis work is to disable Linkerd proxy
Mmm. Had one occasion where the skip-inbound-ports annotation dod not help as a workaround. Now it works again. But this is equivalent to disabling Linkerd for Redis, so not really a solution
The errors are from the Redis proxies. As you may know the Redis pods connect to each other both for Sentinel and replication. From what I heard when things started failing, other apps using Redis had connection issues
But to investigate it is simplest to just deploy Redis with the configuration above and it will start failing right away. Works fine without Linkerd. Also works with the old version of Redis and Linkerd enabled, which was our configuration before we attempted upgrade of Redis