Linkerd proxy is causing Redis to fail on upgrade to latest Redis version

Hi,

We are running Linkerd enterprise-2.16.1.
In our clusters we have Redis version 7.2.4 running in with:

architecture: replication
sentinel:
  enabled: true

We use this helm chart:

This has been working fine. All Redis pods are meshed and I see that the proxy-injector is also adding this annotation: config.linkerd.io/opaque-ports: "6379". We probably should add the sentinel port 26379 there also, but it has not been a problem

Today I tried to upgrade Redis to the version 7.4.1 with Helm chart version 20.2.1
After the upgrade there is a lot of connection issues. Redis pods can not connect to each other anymore. And applications can not connect to Redis.
I see these type of errors in the Linkerd proxy log:

linkerd-proxy [   200.305512s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:46842 server.addr=10.241.73.252:6379
linkerd-proxy [   200.821799s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:52114 server.addr=10.241.73.252:26379

If I disable Linkerd proxy for Redis pods then everything works fine. But, as we are using Linkerd we want to run all the pods in our application namespace in the service mesh.

I need some help figuring out what is going on and how to fix it.

Adding this annotation to Redis pods also makes Redis work:
config.linkerd.io/skip-inbound-ports: “6379,26379”
Not really a solution since we want to run all pods in the service mesh with mTLS

This looks like a bug in Linkerd I think

It is very easy to reproduce. Create a test-values.yaml file with this content:

architecture: replication
sentinel:
  enabled: true
replica:
  podAnnotations:
    linkerd.io/inject: enabled
    config.linkerd.io/opaque-ports: "6379,26379"

Then install Redis using this command:

helm install redis oci://registry-1.docker.io/bitnamicharts/redis -f test-values.yaml

Redis pods will now be flooded with
INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1
errors in the proxy logs
And redis pods can not connect to each other
@Flynn @william I would be grateful is someone could take a look at this issue

I was wrong. It does not work with this annotation:
config.linkerd.io/skip-inbound-ports: “6379,26379”
only way to make latest version of redis work is to disable Linkerd proxy

Mmm. Had one occasion where the skip-inbound-ports annotation dod not help as a workaround. Now it works again. But this is equivalent to disabling Linkerd for Redis, so not really a solution

Are those connection closed errors coming from the Redis proxies, or from the apps trying to connect to Redis?

The errors are from the Redis proxies. As you may know the Redis pods connect to each other both for Sentinel and replication. From what I heard when things started failing, other apps using Redis had connection issues

But to investigate it is simplest to just deploy Redis with the configuration above and it will start failing right away. Works fine without Linkerd. Also works with the old version of Redis and Linkerd enabled, which was our configuration before we attempted upgrade of Redis

Any updates on this? Any Linkerd user that are also using Redis will likely hit the same problem when upgrading to latest Redis version

Hi Jan, sounds like we were able to reproduce this locally, so good news it’s not just you. If you have any ability to try different Redis versions to help us pinpoint the exact change that induced this behavior, that would be helpful. Otherwise we’ll plug away and let you know what you find.

Hi,
I’ve some more info for you to maybe help you track down the issue:
Working versions of the Bitnami redis helm chart is:
18.6.4 → redis:7.2.4-debian-11-r0
18.7.1 → redis:7.2.4-debian-11-r2
Not working version:
18.11.1 → redis:7.2.4-debian-11-r4. # Linkerd proxy failing to get into ready state and eventually crash
18.12.1 → redis:7.2.4-debian-11-r5 # Linkerd proxy starting but blocking traffic so redis is not working. Same as the latest helm chart and redis version

1 Like

Any progress on investigation and a fix?

@william Any updates?

It is on our list but we haven’t had time to devote to it yet.

Hi @Jan , we just ran into this as well. For us, we managed to resolved it by disabling the NetworkPolicy, which the bitnami helm chart has silently enabled. Maybe this will also work for you?

architecture: replication
sentinel:
  enabled: true
networkPolicy:
  enabled: false

Hi @Andrew. This fixed the issue for us also. Completely missed that they had added a NetworkPolicy. Thank you very much for letting me know. I’m fine with closing this issue now