Linkerd proxy is causing Redis to fail on upgrade to latest Redis version

Jan · November 1, 2024, 12:25pm

Hi,

We are running Linkerd enterprise-2.16.1.
In our clusters we have Redis version 7.2.4 running in with:

architecture: replication
sentinel:
  enabled: true

We use this helm chart:

This has been working fine. All Redis pods are meshed and I see that the proxy-injector is also adding this annotation: config.linkerd.io/opaque-ports: "6379". We probably should add the sentinel port 26379 there also, but it has not been a problem

Today I tried to upgrade Redis to the version 7.4.1 with Helm chart version 20.2.1
After the upgrade there is a lot of connection issues. Redis pods can not connect to each other anymore. And applications can not connect to Redis.
I see these type of errors in the Linkerd proxy log:

linkerd-proxy [   200.305512s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:46842 server.addr=10.241.73.252:6379
linkerd-proxy [   200.821799s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.241.75.1:52114 server.addr=10.241.73.252:26379

If I disable Linkerd proxy for Redis pods then everything works fine. But, as we are using Linkerd we want to run all the pods in our application namespace in the service mesh.

I need some help figuring out what is going on and how to fix it.

Jan · November 1, 2024, 1:32pm

Adding this annotation to Redis pods also makes Redis work:
config.linkerd.io/skip-inbound-ports: “6379,26379”
Not really a solution since we want to run all pods in the service mesh with mTLS

This looks like a bug in Linkerd I think

Jan · November 4, 2024, 10:02am

It is very easy to reproduce. Create a test-values.yaml file with this content:

architecture: replication
sentinel:
  enabled: true
replica:
  podAnnotations:
    linkerd.io/inject: enabled
    config.linkerd.io/opaque-ports: "6379,26379"

Then install Redis using this command:

helm install redis oci://registry-1.docker.io/bitnamicharts/redis -f test-values.yaml

Redis pods will now be flooded with
INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1
errors in the proxy logs
And redis pods can not connect to each other
@Flynn @william I would be grateful is someone could take a look at this issue

Jan · November 4, 2024, 10:14am

I was wrong. It does not work with this annotation:
config.linkerd.io/skip-inbound-ports: “6379,26379”
only way to make latest version of redis work is to disable Linkerd proxy

Mmm. Had one occasion where the skip-inbound-ports annotation dod not help as a workaround. Now it works again. But this is equivalent to disabling Linkerd for Redis, so not really a solution

william · November 4, 2024, 8:46pm

Are those connection closed errors coming from the Redis proxies, or from the apps trying to connect to Redis?

Jan · November 5, 2024, 8:19am

The errors are from the Redis proxies. As you may know the Redis pods connect to each other both for Sentinel and replication. From what I heard when things started failing, other apps using Redis had connection issues

But to investigate it is simplest to just deploy Redis with the configuration above and it will start failing right away. Works fine without Linkerd. Also works with the old version of Redis and Linkerd enabled, which was our configuration before we attempted upgrade of Redis

Jan · November 8, 2024, 2:13pm

Any updates on this? Any Linkerd user that are also using Redis will likely hit the same problem when upgrading to latest Redis version

william · November 11, 2024, 5:14pm

Hi Jan, sounds like we were able to reproduce this locally, so good news it’s not just you. If you have any ability to try different Redis versions to help us pinpoint the exact change that induced this behavior, that would be helpful. Otherwise we’ll plug away and let you know what you find.

Jan · November 12, 2024, 9:39am

Hi,
I’ve some more info for you to maybe help you track down the issue:
Working versions of the Bitnami redis helm chart is:
18.6.4 → redis:7.2.4-debian-11-r0
18.7.1 → redis:7.2.4-debian-11-r2
Not working version:
18.11.1 → redis:7.2.4-debian-11-r4. # Linkerd proxy failing to get into ready state and eventually crash
18.12.1 → redis:7.2.4-debian-11-r5 # Linkerd proxy starting but blocking traffic so redis is not working. Same as the latest helm chart and redis version

Jan · November 15, 2024, 9:58am

Any progress on investigation and a fix?

Jan · November 25, 2024, 8:59am

@william Any updates?

william · November 25, 2024, 7:35pm

It is on our list but we haven’t had time to devote to it yet.

Andrew · December 4, 2024, 7:56am

Hi @Jan , we just ran into this as well. For us, we managed to resolved it by disabling the NetworkPolicy, which the bitnami helm chart has silently enabled. Maybe this will also work for you?

architecture: replication
sentinel:
  enabled: true
networkPolicy:
  enabled: false

Jan · December 4, 2024, 10:04am

Hi @Andrew. This fixed the issue for us also. Completely missed that they had added a NetworkPolicy. Thank you very much for letting me know. I’m fine with closing this issue now

Topic		Replies	Views
Linkerd-proxy failed to become ready within 120s timeout Linkerd General Discussion proxy	1	429	February 26, 2024
Error injecting proxy for mysql-router deployment Linkerd General Discussion	2	670	October 28, 2023
Linkerd Destination Scaling Issues Linkerd General Discussion	2	629	February 14, 2024
Meshed pods fail to connect to unmeshed smtp/redis pods with Linkerd 2.14.2 Linkerd General Discussion	10	1635	November 9, 2023
Linkerd and TCP connections not working. Linkerd General Discussion	0	437	February 18, 2024

Linkerd proxy is causing Redis to fail on upgrade to latest Redis version

Related topics