TLS not working as expected

Hi all!

We adopted linkerd to support tls termination between pods inside our Kubernetes cluster. We have some services that communicating between each other using grpc over secure connection.

When services communicating using public FQDN everything working fine since TLS is terminating on AWS NLB level, but we also would like to provide communication over the Kubernetes network using Kubernetes service hostname (service.namespace.svc.cluster.local). After reading documentation we thought that when you inject linkerd-proxy you are able to communicate securely without any additional configuration.

Did we understood it wrong and we need to configure some additional stuff, like route traffic from k8 service to linkerd-proxy instead of microservice container etc?

Any advice is appreciated. Thanks!

Welcome @YuriiP! :slight_smile:

In general, cloud load balancers can’t be meshed: there’s not anything running inside the cluster to which Linkerd can attach its proxy. That means that the first hop from the LB to a meshed Pod will be unencrypted. All the communications between meshed Pods will still be encrypted; it’s just that first hop which is cleartext.

The most straightforward way to get everything encrypted is to not have the NLB terminate TLS. Instead, have the NLB direct traffic to a meshed Kubernetes ingress controller, and let the ingress controller terminate TLS.

Handling ingress traffic | Linkerd has some more details here.

@YuriiP Any communication between meshed pods in a Kubernetes clusters will be mTLS’d without you having to do any further configuration. You should not have to manually route traffic from K8s services to linkerd-proxy; that is all taken care of for you when you mesh the pods.

Are you seeing behavior to the contrary?

Hi @William . Yes, it seems that we have the contrary behaviour, all requests through grpc protocol between meshed pods are failing no matter if you use internal kubeservice domain or pod ip.

Please provide some log lines and further details about the failures. These calls should just work.

@william

Ok. So there are two pods: service-1 and service-2, both in same namespace (services).

Executing following command in pod-1:

grpcurl service-2.example.com:443 HelloWorld

return the following error Error invoking method "HelloWorld": given method name "HelloWorld" is not in expected format: 'service/method' or 'service.method' which is fine since the request scheme is not correct, but important thing there that we can see that grpc request itself reached the target successfully. As you can see it is working when we use public fqdn since TLS is terminated on NLB side.

Lets try same command but using internal kubeservice domain:

grpcurl service-2.services.svc.cluster.local:7233 HelloWorld

Response is: Failed to dial target host "service-2.services.svc.cluster.local:7233": tls: first record does not look like a TLS handshake

service-2 linkerd-proxy logs:

[     0.003504s]  INFO ThreadId(01) linkerd2_proxy: release 2.210.0 (85db2fc) by linkerd on 2023-09-21T21:24:58Z
[     0.004292s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.004986s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.005003s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.005006s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.005009s]  INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190
[     0.005012s]  INFO ThreadId(01) linkerd2_proxy: Local identity is default.temporal.serviceaccount.identity.linkerd.cluster.local
[     0.005014s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.005017s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.034203s]  INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=default.temporal.serviceaccount.identity.linkerd.cluster.local
[     0.713850s]  INFO ThreadId(01) outbound:proxy{addr=10.43.238.84:7233}:service{ns= name=service port=0}: linkerd_proxy_api_resolve::resolve: No endpoints
[     1.796331s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.1.1.239:47054
[    32.802473s]  INFO ThreadId(01) outbound: linkerd_app_core::serve: Connection closed error=connect timed out after 1s client.addr=10.1.1.239:35382

linkerd-check output:

kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks contains all pods
√ cluster networks contains all services

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    is running version 2.13.5 but the latest stable version is 2.14.1
    see https://linkerd.io/2.13/checks/#l5d-version-cli for hints

control-plane-version
---------------------
√ can retrieve the control plane version
√ control plane is up-to-date
‼ control plane and cli versions match
    control plane running stable-2.14.1 but cli running stable-2.13.5
    see https://linkerd.io/2.13/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
√ control plane proxies are up-to-date
‼ control plane proxies and cli versions match
    linkerd-destination-5476c5bc64-bjnkn running stable-2.14.1 but cli running stable-2.13.5
    see https://linkerd.io/2.13/checks/#l5d-cp-proxy-cli-version for hints

linkerd-multicluster
--------------------
× Link CRD exists
    multicluster.linkerd.io/Link CRD is missing: the server could not find the requested resource
    see https://linkerd.io/2.13/checks/#l5d-multicluster-link-crd-exists for hints

linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ can initialize the client
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ viz extension proxies are healthy
√ viz extension proxies are up-to-date
‼ viz extension proxies and cli versions match
    metrics-api-6bbd6d4bc7-4cnrn running stable-2.14.1 but cli running stable-2.13.5
    see https://linkerd.io/2.13/checks/#l5d-viz-proxy-cli-version for hints
√ prometheus is installed and configured correctly
√ viz extension self-check

Status check results are ×

Does the command grpcurl service-2.services.svc.cluster.local:7233 HelloWorld work without Linkerd?

No, it’s not, because there is no TLS. It works only if I add —plaintext options to ignore secure connection

Hey @YuriiP,

That’s super useful information! The short answer here is that is what you would expect because while Linkerd is encrypting your traffic in transit it is transparent to your application. The longer answer is when you use Linkerd you should continue to run your apps as if they are sending data in plain text. Linkerd’s proxy is encrypting the data from proxy to proxy. The traffic from app to proxy and proxy to app should always be in plain text.

Thanks,
Jason

Hey @jmo . Ohh, I misunderstood the concept of how Linkerd works but after your explanation it makes sense now.

Thank you!