Meshing a pod with tailscale tsnet

Hello, I’ve been playing around with embedding tailscale’s tsnet in a web server app that I’m running in my kubernetes cluster with linkerd installed.

However, when the pod is meshed, tsnet which is embedded into my Go app will completely fail to connect to tailscale with logs like

2023/05/20 08:54:07 logtail: dial "log.tailscale.io:443" failed: dial tcp 34.229.201.48:443: i/o timeout (in 30s), trying bootstrap...
control: bootstrapDNS(\"derp6.tailscale.com\", \"68.183.90.120\") for \"controlplane.tailscale.com\" error: Get \"https://derp6.tailscale.com/bootstrap-dns?q=controlplane.tailscale.com\": context deadline exceeded

Execing into the pod and curling with curl https://log.tailscale.io works fine. I appreciate tailscale is doing some clever things under the hood with userspace mode, but is anybody able to help me debug this?

Adding config.linkerd.io/skip-outbound-ports: "443" fixes the issue, but can anybody tell me why it fixes it?

Here is some linkerd-debug logs around the time of one of the failed log.tailscale.io calls. I can see a DNS response is received, but I don’t see an attempt at requesting from the IP.

 8370 921.978266718  10.244.0.84 → 10.245.0.10  DNS 123 Standard query 0xd462 A log.tailscale.io.s-platform-auth.svc.cluster.local OPT
 8371 921.978386161  10.244.0.84 → 10.245.0.10  DNS 123 Standard query 0x0451 AAAA log.tailscale.io.s-platform-auth.svc.cluster.local OPT
 8372 921.978772319  10.245.0.10 → 10.244.0.84  DNS 216 Standard query response 0xd462 No such name A log.tailscale.io.s-platform-auth.svc.cluster.local SOA ns.dns.cluster.local OPT
 8373 921.980462054  10.245.0.10 → 10.244.0.84  DNS 216 Standard query response 0x0451 No such name AAAA log.tailscale.io.s-platform-auth.svc.cluster.local SOA ns.dns.cluster.local OPT
 8374 921.980715031  10.244.0.84 → 10.245.0.10  DNS 107 Standard query 0xe324 AAAA log.tailscale.io.svc.cluster.local OPT
 8375 921.980835689  10.244.0.84 → 10.245.0.10  DNS 107 Standard query 0xd4b3 A log.tailscale.io.svc.cluster.local OPT
 8376 921.982346896  10.245.0.10 → 10.244.0.84  DNS 200 Standard query response 0xe324 No such name AAAA log.tailscale.io.svc.cluster.local SOA ns.dns.cluster.local OPT
 8377 921.982385154  10.245.0.10 → 10.244.0.84  DNS 200 Standard query response 0xd4b3 No such name A log.tailscale.io.svc.cluster.local SOA ns.dns.cluster.local OPT
 8378 921.982600253  10.244.0.84 → 10.245.0.10  DNS 103 Standard query 0x9299 AAAA log.tailscale.io.cluster.local OPT
 8379 921.982708129  10.244.0.84 → 10.245.0.10  DNS 103 Standard query 0x0335 A log.tailscale.io.cluster.local OPT
 8380 921.983059238  10.245.0.10 → 10.244.0.84  DNS 196 Standard query response 0x0335 No such name A log.tailscale.io.cluster.local SOA ns.dns.cluster.local OPT
 8381 921.983214823  10.245.0.10 → 10.244.0.84  DNS 196 Standard query response 0x9299 No such name AAAA log.tailscale.io.cluster.local SOA ns.dns.cluster.local OPT
 8382 921.983561725  10.244.0.84 → 10.245.0.10  DNS 89 Standard query 0xd276 A log.tailscale.io OPT
 8383 921.983651775  10.244.0.84 → 10.245.0.10  DNS 89 Standard query 0xe9ae AAAA log.tailscale.io OPT
 8384 921.984671608  10.245.0.10 → 10.244.0.84  DNS 133 Standard query response 0xe9ae AAAA log.tailscale.io AAAA 2600:1f18:429f:9305:4043:217b:512c:f8d4 OPT
 8385 921.984962495  10.245.0.10 → 10.244.0.84  DNS 121 Standard query response 0xd276 A log.tailscale.io A 34.229.201.48 OPT
 8386 921.985402903  10.244.0.84 → 127.0.0.1    TCP 76 34822 → 4140 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2904114136 TSecr=0 WS=128
 8387 923.019857479  10.244.0.84 → 127.0.0.1    TCP 76 [TCP Retransmission] 34822 → 4140 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2904115170 TSecr=0 WS=128
 8388 923.572616803  10.244.0.84 → 10.244.0.84  TCP 68 40938 → 8000 [FIN, ACK] Seq=178 Ack=151 Win=65536 Len=0 TSval=1094190427 TSecr=1094185427
 8389 923.572746397  10.244.0.84 → 10.244.0.84  TCP 68 8000 → 40938 [FIN, ACK] Seq=151 Ack=179 Win=65536 Len=0 TSval=1094190427 TSecr=1094190427
 8390 923.572755916  10.244.0.84 → 10.244.0.84  TCP 68 40938 → 8000 [ACK] Seq=179 Ack=152 Win=65536 Len=0 TSval=1094190427 TSecr=1094190427
 8391 925.031868927  10.244.0.84 → 127.0.0.1    TCP 76 [TCP Retransmission] 34822 → 4140 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=2904117182 TSecr=0 WS=128
 8392 925.683194132  10.244.0.84 → 10.245.0.10  DNS 108 Standard query 0x4a01 SRV linkerd-dst-headless.linkerd.svc.cluster.local
 8393 925.685192118  10.245.0.10 → 10.244.0.84  DNS 308 Standard query response 0x4a01 SRV linkerd-dst-headless.linkerd.svc.cluster.local SRV 0 100 8086 10-244-1-119.linkerd-dst-headless.linkerd.svc.cluster.local A 10.244.1.119
 8394 926.686885398  10.244.0.84 → 10.245.0.10  DNS 102 Standard query 0x0823 SRV linkerd-policy.linkerd.svc.cluster.local
 8395 926.687209694  10.245.0.10 → 10.244.0.84  DNS 284 Standard query response 0x0823 SRV linkerd-policy.linkerd.svc.cluster.local SRV 0 100 8090 10-244-1-119.linkerd-policy.linkerd.svc.cluster.local A 10.244.1.119

Thanks,
Nick.

1 Like

I think I’ve got to the bottom of this somewhat. Tailscale set some socket options on the client that communicates with their control plane.

I think it was the BINDTODEVICE option in particular.

I have now fixed this with netns.SetEnabled(false) which prevents these options from getting set.