Hi guys, I am trying to expand my mesh to non-k8s workloads. I followed this guide but needed to make adjustments because I am in a real world SPIRE bubble with its own trust domain.
Scenario:
- managed k8s at cloud provider A running the linkerd control plane (edge version, installed with helm chart)
- integration with vault PKI running in cloud provider B (this works fine)
- separate setup (bunch of VMs) at another cloud provider B
- SPIRE server running here, hooked up to the same vault PKI intermediate
- I created a dedicated VM for the mesh expansion attempt at cloud B
- It has a containerized linkerd-proxy running next to a containerized spire-agent
- the node and workload attestation are working fine
The control plane is reachable by the linkerd-proxy (from what I can tell looking at the logs). My understanding is that the control plane (the policy controller) rejects “something” by saying “unknown server”:
WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Some requested entity was not found grpc.message=“unknown server”
There are quite a few knobs involved that I could imagine causing trouble, so I will provide what I consider relevant. I deploy my linkerd-proxy container using ansible (if you cannot read it, let me know so I can explain):
- name: Launch proxy container
community.docker.docker_container:
name: linkerd-proxy
[…]
env:
LINKERD2_PROXY_IDENTITY_SERVER_NAME: "{{ inventory_hostname }}.cluster.local"
LINKERD2_PROXY_IDENTITY_SERVER_ID: spiffe://{{ spire_trust_domain }}/{{ inventory_hostname }}/linkerd-proxy
LINKERD2_PROXY_DESTINATION_CONTEXT: "{{ dest_ctx | to_json }}"
LINKERD2_PROXY_POLICY_WORKLOAD: "{{ policy_workload | to_json }}"
LINKERD2_PROXY_DESTINATION_SVC_NAME: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_DESTINATION_SVC_ADDR: linkerd-dst-headless.linkerd.svc.cluster.local:8086
LINKERD2_PROXY_POLICY_SVC_ADDR: linkerd-policy.linkerd.svc.cluster.local:8090
LINKERD2_PROXY_POLICY_SVC_NAME: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS: "{{ trust_bundle_fetch_result.stdout }}"
LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET: unix:///home/proxy/api.sock
LINKERD2_PROXY_INBOUND_DEFAULT_POLICY: all-authenticated
etc_hosts:
linkerd-dst-headless.linkerd.svc.cluster.local: "{{ stackit_lb_ip }}"
linkerd-policy.linkerd.svc.cluster.local: "{{ stackit_lb_ip }}"
network_mode: host
volumes:
- /opt/spire_agent/api.sock:/home/proxy/api.sock
pid_mode: host
labels: "{{ {spire_container_workload_label: ''} }}"
vars:
stackit_lb_ip: here.is.some.ip
k8s_ns: mixed-env
dest_ctx:
ns: "{{ k8s_ns }}"
nodeName: "{{ inventory_hostname }}"
external_workload: "{{ inventory_hostname }}"
policy_workload:
ns: "{{ k8s_ns }}"
external_workload: "{{ inventory_hostname }}"
Effective env vars for the proxy:
LINKERD2_PROXY_IDENTITY_SERVER_NAME=myvm.cluster.local
LINKERD2_PROXY_IDENTITY_SERVER_ID=spiffe://my.trust.domain/myvm/linkerd-proxy
LINKERD2_PROXY_DESTINATION_CONTEXT={\"ns\": \"mixed-env\", \"nodeName\": \"myvm\", \"external_workload\": \"myvm\"}
LINKERD2_PROXY_POLICY_WORKLOAD={\"ns\": \"mixed-env\", \"external_workload\": \"myvm\"}
LINKERD2_PROXY_DESTINATION_SVC_NAME=linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_DESTINATION_SVC_ADDR=linkerd-dst-headless.linkerd.svc.cluster.local:8086
LINKERD2_PROXY_POLICY_SVC_ADDR=linkerd-policy.linkerd.svc.cluster.local:8090
LINKERD2_PROXY_POLICY_SVC_NAME=linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS=-----BEGIN CERTIFICATE-----\n…\n-----END CERTIFICATE-----
LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET=unix:///home/proxy/api.sock
LINKERD2_PROXY_INBOUND_DEFAULT_POLICY=all-authenticated
I translated the iptables commands from the tutorial to make it work with firewalld (which seems to be working fine):
#!/bin/bash
PROXY_INBOUND_PORT=4143
PROXY_OUTBOUND_PORT=4140
PROXY_USER_UID=$(id -u linkerd-proxy)
# default inbound and outbound ports to ignore
INBOUND_PORTS_TO_IGNORE="4190,4191,4567,4568,22"
OUTBOUND_PORTS_TO_IGNORE="4567,4568,22,443,9200,5601,8220"
# prepare chains
firewall-cmd --permanent --direct --add-chain ipv4 nat PROXY_INIT_REDIRECT
firewall-cmd --permanent --direct --add-chain ipv4 nat PROXY_INIT_OUTPUT
firewall-cmd --permanent --direct --add-rule ipv4 nat PREROUTING 0 -j PROXY_INIT_REDIRECT
firewall-cmd --permanent --direct --add-rule ipv4 nat OUTPUT 0 -j PROXY_INIT_OUTPUT
# inbound
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_REDIRECT 0 -p tcp -m multiport --dports $INBOUND_PORTS_TO_IGNORE -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_REDIRECT 0 -p tcp -j REDIRECT --to-ports $PROXY_INBOUND_PORT
# outbound
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -m owner --uid-owner $PROXY_USER_UID -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -o lo -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -p tcp -m multiport --dports $OUTBOUND_PORTS_TO_IGNORE -j RETURN
firewall-cmd --permanent --direct --add-rule ipv4 nat PROXY_INIT_OUTPUT 0 -p tcp -j REDIRECT --to-ports $PROXY_OUTBOUND_PORT
firewall-cmd --reload
The proxy log:
INFO linkerd2_proxy: release 2.290.0 (686934c) by linkerd on 2025-04-01T17:52:29Z
INFO linkerd_app::env: `LINKERD2_PROXY_INBOUND_IPS` allowlist not configured, allowing all target addresses
INFO linkerd_app::env: LINKERD2_PROXY_POLICY_CLUSTER_NETWORKS not set; cluster-scoped modes are unsupported
INFO linkerd2_proxy::rt: Using single-threaded proxy runtime
INFO linkerd2_proxy: Admin interface on 127.0.0.1:4191
INFO linkerd2_proxy: Inbound interface on 0.0.0.0:4143
INFO linkerd2_proxy: Outbound interface on 127.0.0.1:4140
INFO linkerd2_proxy: Tap DISABLED
INFO linkerd2_proxy: SNI is myvm.cluster.local
INFO linkerd2_proxy: Local identity is spiffe://my.trust.domain/myvm/linkerd-proxy
INFO linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
INFO dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=here.is.some.ip:8086
INFO policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=here.is.some.ip:8090
INFO daemon:identity: linkerd_app: Certified identity id=spiffe://my.trust.domain/myvm/linkerd-proxy
WARN watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Some requested entity was not found grpc.message="unknown server"
Note: my spire bubble has an own trust domain that is different from linkerd’s cluster domain. I suspect this could cause trouble.
The VM/node (myvm
) is attested by spire as well as a (docker) workload (the proxy container linkerd-proxy
) running on it. I was just guessing how this is supposed to work, i.e. what to tell the linkerd control plane who is calling. There is some potential for failure in this as well.
So from my understanding the proxy can successfully contact the control plane but something is off, causing it to reject the proxy. I have no real workload running behind the linkerd-proxy yet. I would start it at port 80 like the tutorial does. Here is my resource on the k8s end:
---
apiVersion: workload.linkerd.io/v1alpha1
kind: ExternalWorkload
metadata:
name: myvm
namespace: mixed-env
labels:
location: hcloud-vm
app: hcloud-app
workload_name: myvm
spec:
meshTls:
identity: "spiffe://my.trust.domain/myvm/linkerd-proxy"
serverName: "myvm.cluster.local"
workloadIPs:
- ip: wan.ip.of.myvm
ports:
- port: 80
name: http
status:
conditions:
- type: Ready
status: "True"
---
apiVersion: v1
kind: Service
metadata:
name: myvm
namespace: mixed-env
spec:
type: ClusterIP
selector:
workload_name: myvm
ports:
- port: 80
protocol: TCP
name: http
I did not add any DNS entries yet. The guide said that would only be necessary to allow my external workload to resolve in-cluster workloads, but I am not there yet.
I am not sure about what identity the linkerd control plane expects to be attested: the node/host, the linkerd proxy or whatever workload I will spin up on that host later on? And how would that be connected to the different ports I bind on the host? I guess I could run multiple container workloads on that VM and they could all access the mesh or expose own ports to the mesh, but would they all share one identity?
Looking forward to some guidance! I guess I misunderstood something somewhere.