Mesh Expansion fails due to "unknown server"

Hi guys, I am trying to expand my mesh to non-k8s workloads. I followed this guide but needed to make adjustments because I am in a real world SPIRE bubble with its own trust domain.

Scenario:

  • managed k8s at cloud provider A running the linkerd control plane (edge version, installed with helm chart)
    • integration with vault PKI running in cloud provider B (this works fine)
  • separate setup (bunch of VMs) at another cloud provider B
    • SPIRE server running here, hooked up to the same vault PKI intermediate
  • I created a dedicated VM for the mesh expansion attempt at cloud B
    • It has a containerized linkerd-proxy running next to a containerized spire-agent
    • the node and workload attestation are working fine

The control plane is reachable by the linkerd-proxy (from what I can tell looking at the logs). My understanding is that the control plane (the policy controller) rejects “something” by saying “unknown server”:

WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Some requested entity was not found grpc.message=“unknown server”

There are quite a few knobs involved that I could imagine causing trouble, so I will provide what I consider relevant. I deploy my linkerd-proxy container using ansible (if you cannot read it, let me know so I can explain):

- name: Launch proxy container
  community.docker.docker_container:
    name: linkerd-proxy
    […]
    env:
      LINKERD2_PROXY_IDENTITY_SERVER_NAME: "{{ inventory_hostname }}.cluster.local"
      LINKERD2_PROXY_IDENTITY_SERVER_ID: spiffe://{{ spire_trust_domain }}/{{ inventory_hostname }}/linkerd-proxy
      LINKERD2_PROXY_DESTINATION_CONTEXT: "{{ dest_ctx | to_json }}"
      LINKERD2_PROXY_POLICY_WORKLOAD: "{{ policy_workload | to_json }}"
      LINKERD2_PROXY_DESTINATION_SVC_NAME: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_DESTINATION_SVC_ADDR: linkerd-dst-headless.linkerd.svc.cluster.local:8086
      LINKERD2_PROXY_POLICY_SVC_ADDR: linkerd-policy.linkerd.svc.cluster.local:8090
      LINKERD2_PROXY_POLICY_SVC_NAME: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
      LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS: "{{ trust_bundle_fetch_result.stdout }}"
      LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET: unix:///home/proxy/api.sock
      LINKERD2_PROXY_INBOUND_DEFAULT_POLICY: all-authenticated
    etc_hosts:
      linkerd-dst-headless.linkerd.svc.cluster.local: "{{ stackit_lb_ip }}"
      linkerd-policy.linkerd.svc.cluster.local: "{{ stackit_lb_ip }}"
    network_mode: host
    volumes:
      - /opt/spire_agent/api.sock:/home/proxy/api.sock
    pid_mode: host
    labels: "{{ {spire_container_workload_label: ''} }}"
  vars:
    stackit_lb_ip: here.is.some.ip
    k8s_ns: mixed-env
    dest_ctx:
      ns: "{{ k8s_ns }}"
      nodeName: "{{ inventory_hostname }}"
      external_workload: "{{ inventory_hostname }}"
    policy_workload:
      ns: "{{ k8s_ns }}"
      external_workload: "{{ inventory_hostname }}"

Effective env vars for the proxy:

LINKERD2_PROXY_IDENTITY_SERVER_NAME=myvm.cluster.local
LINKERD2_PROXY_IDENTITY_SERVER_ID=spiffe://my.trust.domain/myvm/linkerd-proxy
LINKERD2_PROXY_DESTINATION_CONTEXT={\"ns\": \"mixed-env\", \"nodeName\": \"myvm\", \"external_workload\": \"myvm\"}
LINKERD2_PROXY_POLICY_WORKLOAD={\"ns\": \"mixed-env\", \"external_workload\": \"myvm\"}
LINKERD2_PROXY_DESTINATION_SVC_NAME=linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_DESTINATION_SVC_ADDR=linkerd-dst-headless.linkerd.svc.cluster.local:8086
LINKERD2_PROXY_POLICY_SVC_ADDR=linkerd-policy.linkerd.svc.cluster.local:8090
LINKERD2_PROXY_POLICY_SVC_NAME=linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS=-----BEGIN CERTIFICATE-----\n…\n-----END CERTIFICATE-----
LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET=unix:///home/proxy/api.sock
LINKERD2_PROXY_INBOUND_DEFAULT_POLICY=all-authenticated

I translated the iptables commands from the tutorial to make it work with firewalld (which seems to be working fine):

#!/bin/bash

PROXY_INBOUND_PORT=4143
PROXY_OUTBOUND_PORT=4140
PROXY_USER_UID=$(id -u linkerd-proxy)
# default inbound and outbound ports to ignore
INBOUND_PORTS_TO_IGNORE="4190,4191,4567,4568,22"
OUTBOUND_PORTS_TO_IGNORE="4567,4568,22,443,9200,5601,8220"

# prepare chains
firewall-cmd --permanent --direct --add-chain ipv4 nat PROXY_INIT_REDIRECT
firewall-cmd --permanent --direct --add-chain ipv4 nat PROXY_INIT_OUTPUT
firewall-cmd --permanent --direct --add-rule ipv4 nat PREROUTING 0 -j PROXY_INIT_REDIRECT
firewall-cmd --permanent --direct --add-rule ipv4 nat OUTPUT 0 -j PROXY_INIT_OUTPUT

# inbound
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_REDIRECT 0 -p tcp -m multiport --dports $INBOUND_PORTS_TO_IGNORE -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_REDIRECT 0 -p tcp -j REDIRECT --to-ports $PROXY_INBOUND_PORT

# outbound
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -m owner --uid-owner $PROXY_USER_UID -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -o lo -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -p tcp -m multiport --dports $OUTBOUND_PORTS_TO_IGNORE -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -p tcp -j REDIRECT --to-ports $PROXY_OUTBOUND_PORT

firewall-cmd --reload

The proxy log:

INFO linkerd2_proxy: release 2.290.0 (686934c) by linkerd on 2025-04-01T17:52:29Z
INFO linkerd_app::env: `LINKERD2_PROXY_INBOUND_IPS` allowlist not configured, allowing all target addresses
INFO linkerd_app::env: LINKERD2_PROXY_POLICY_CLUSTER_NETWORKS not set; cluster-scoped modes are unsupported
INFO linkerd2_proxy::rt: Using single-threaded proxy runtime
INFO linkerd2_proxy: Admin interface on 127.0.0.1:4191
INFO linkerd2_proxy: Inbound interface on 0.0.0.0:4143
INFO linkerd2_proxy: Outbound interface on 127.0.0.1:4140
INFO linkerd2_proxy: Tap DISABLED
INFO linkerd2_proxy: SNI is myvm.cluster.local
INFO linkerd2_proxy: Local identity is spiffe://my.trust.domain/myvm/linkerd-proxy
INFO linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
INFO dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=here.is.some.ip:8086
INFO policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=here.is.some.ip:8090
INFO daemon:identity: linkerd_app: Certified identity id=spiffe://my.trust.domain/myvm/linkerd-proxy
WARN watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Some requested entity was not found grpc.message="unknown server"

Note: my spire bubble has an own trust domain that is different from linkerd’s cluster domain. I suspect this could cause trouble.

The VM/node (myvm) is attested by spire as well as a (docker) workload (the proxy container linkerd-proxy) running on it. I was just guessing how this is supposed to work, i.e. what to tell the linkerd control plane who is calling. There is some potential for failure in this as well.

So from my understanding the proxy can successfully contact the control plane but something is off, causing it to reject the proxy. I have no real workload running behind the linkerd-proxy yet. I would start it at port 80 like the tutorial does. Here is my resource on the k8s end:

---
apiVersion: workload.linkerd.io/v1alpha1
kind: ExternalWorkload
metadata:
  name: myvm
  namespace: mixed-env
  labels:
    location: hcloud-vm
    app: hcloud-app
    workload_name: myvm
spec:
  meshTls:
    identity: "spiffe://my.trust.domain/myvm/linkerd-proxy"
    serverName: "myvm.cluster.local"
  workloadIPs:
    - ip: wan.ip.of.myvm
  ports:
    - port: 80
      name: http
status:
  conditions:
    - type: Ready
      status: "True"
---
apiVersion: v1
kind: Service
metadata:
  name: myvm
  namespace: mixed-env
spec:
  type: ClusterIP
  selector:
    workload_name: myvm
  ports:
    - port: 80
      protocol: TCP
      name: http

I did not add any DNS entries yet. The guide said that would only be necessary to allow my external workload to resolve in-cluster workloads, but I am not there yet.

I am not sure about what identity the linkerd control plane expects to be attested: the node/host, the linkerd proxy or whatever workload I will spin up on that host later on? And how would that be connected to the different ports I bind on the host? I guess I could run multiple container workloads on that VM and they could all access the mesh or expose own ports to the mesh, but would they all share one identity?

Looking forward to some guidance! I guess I misunderstood something somewhere.

The solution might disappoint you, but apparently I solved it. The guide that I followed used an obsolete version of the ExternalWorkload (workload.linkerd.io/v1alpha1 vs. workload.linkerd.io/v1beta1). The policy endpoint rejected the resource due to “missing” field meshTLS. :roll_eyes:

INFO external_workloads: kubert::errors: stream failed error=failed to perform initial object list: Error deserializing response: missing field `meshTLS` at line 1 column 1697

Turns out the case changed from alpha to beta (meshTls vs. meshTLS). Would be great if anyone with the necessary access could update this guide to avoid misleading more people.

I’m not sure if I’m there yet, but the error described in this thread is gone and everything appears to be working fine.