ERROR linkerd-network-validator in AIR Gapped Installation

Good Day All,

I am having issues installing linkerd in a AIR-GAP AWS EKS (1.28). The pods are all being pulled from the AWS ECR. They all have INIT:CrashLoopBackOff and through k9s when I go into the pods ( linkerd-destinaton-xxx , linkerd-indentity-xxx, linkerd-proxy-injector-xxx) they all say initializing except for the “linkerd_network_validator”. That says, " Failed to validate networking configuration. Please ensure iptables rules are rewriting traffic as expected. timeout=10s.

I am also using Helm and using the --set proxyInit.runAsRoot=true. Any assistance would be appreciated.

The network validator is a process that runs when you configure Linkerd to use linkerd-cni instead of the proxy-init container: it runs to ensure that iptables is properly configured before allowing the pod to start. If that process is failing, it almost surely means that the CNI did not properly configure the pod network. I suggest simplifying your configuration to use proxy-init instead of the linkerd-cni to get to get to a working state.

Thank you. So I tried this without the linkerd-cni as suggested and still getting the same error. Connecting to1.1.1.1:20001. Failed to validate networking configuration. I must be doing something other than what you suggested? Any ideas?

Isn’t 1.1.1.1 cloudflare’s DNS? You mentioned you’re in an air-gapped environment, so perhaps one of your services is trying to reach outside?

All I am doing is changing the location in the YAML file from the internet to the ECR registry. Also these errors are coming up as well.

2023-11-13T19:32:37.078557Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: “404 page not found\n”: Fai _
_ 2023-11-13T19:32:41.272202Z WARN httproutes.gateway.networking.k8s.io: kube_runtime::watcher: watch list error with 403: Api(ErrorResponse { status: “Failure”, message: "httpr _
_ 2023-11-13T19:32:41.272234Z INFO httproutes.gateway.networking.k8s.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: httproutes.gateway. _
_ 2023-11-13T19:32:42.081713Z WARN httproutes.policy.linkerd.io: kube_client::client: Unsuccessful data error parse: 404 page not found _
_ _
_ 2023-11-13T19:32:42.081743Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: “404 page not found\n”: Fai _
_ 2023-11-13T19:32:46.274441Z WARN httproutes.gateway.networking.k8s.io: kube_runtime::watcher: watch list error with 403: Api(ErrorResponse { status: “Failure”, message: "httpr _
_ 2023-11-13T19:32:46.274471Z INFO httproutes.gateway.networking.k8s.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: httproutes.gateway. _
_ 2023-11-13T19:32:47.084928Z WARN httproutes.policy.linkerd.io: kube_client::client: Unsuccessful data error parse: 404 page not found _
_ _
_ 2023-11-13T19:32:47.084963Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: “404 page not found\n”: Fai _
_ 2023-11-13T19:32:51.277525Z WARN httproutes.gateway.networking.k8s.io: kube_runtime::watcher: watch list error with 403: Api(ErrorResponse { status: “Failure”, message: "httpr _
_ 2023-11-13T19:32:51.277557Z INFO httproutes.gateway.networking.k8s.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: httproutes.gateway. _
_ 2023-11-13T19:32:52.087652Z WARN httproutes.policy.linkerd.io: kube_client::client: Unsuccessful data error parse: 404 page not found _
_ _
_ 2023-11-13T19:32:52.087793Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: “404 page not found\n”: Fai _
_ 2023-11-13T19:32:56.281626Z WARN httproutes.gateway.networking.k8s.io: kube_runtime::watcher: watch list error with 403: Api(ErrorResponse { status: “Failure”, message: "httpr _
_ 2023-11-13T19:32:56.281659Z INFO httproutes.gateway.networking.k8s.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: httproutes.gateway. _
_ 2023-11-13T19:32:57.090363Z WARN httproutes.policy.linkerd.io: kube_client::client: Unsuccessful data error parse: 404 page not found _
_ _
_ 2023-11-13T19:32:57.090458Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: “404 page not found\n”: Fai _
_ 2023-11-13T19:33:01.285167Z WARN httproutes.gateway.networking.k8s.io: kube_runtime::watcher: watch list error with 403: Api(ErrorResponse { status: “Failure”, message: "httpr _
_ 2023-11-13T19:33:01.285201Z INFO httproutes.gateway.networking.k8s.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: httproutes.gateway. _
_ 2023-11-13T19:33:02.092683Z WARN httproutes.policy.linkerd.io: kube_client::client: Unsuccessful data error parse: 404 page not found _
_ _
_ 2023-11-13T19:33:02.092717Z INFO httproutes.policy.linkerd.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: “404 page not found\n”: Fai _
_ 2023-11-13T19:33:06.287990Z WARN httproutes.gateway.networking.k8s.io: kube_runtime::watcher: watch list error with 403: Api(ErrorResponse { status: “Failure”, message: "httpr _
_ 2023-11-13T19:33:06.288026Z INFO httproutes.gateway.networking.k8s.io: kubert::errors: stream failed error=failed to perform initial object list: ApiError: httproutes.gateway. _
_ 2023-11-13T19:33:07.095527Z WARN httproutes.policy.linkerd.io: kube_client::client: Unsuccessful data error parse: 404 page not found

Query { name: Name(“linkerd-dst-headless.linkerd.svc.cluster.local.”), query_type: AAAA, query_class: IN }]
[435685.015687s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: no record found for Query { name: Name(“linkerd-policy.linkerd.svc.cluster.local.”), query_type: SRV, query_class:
IN }; failed to resolve A record: no record found for Query { name: Name(“linkerd-policy.linkerd.svc.cluster.local.”), query_type: AAAA, query_class: IN } error.sources=[failed to
resolve A record: no record found for Query { name: Name(“linkerd-policy.linkerd.svc.cluster.local.”), query_type: AAAA, query_class: IN }, no record found for Query { name: Name(“linkerd-policy.linkerd.svc.cluster.local.”), query_type: AAAA, query_class: IN }]
[435690.013672s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed to resolve SRV record: no record found for Query { name: Name(“linkerd-dst-headless.linkerd.svc.cluster.local.”), query_type: SRV, query_class: IN }; failed to resolve A record: no record found for Query { name: Name(“linkerd-dst-headless.linkerd.svc.cluster.local.”), query_type: AAAA, query_class: IN } error.sources=[failed to resolve A record: no record found for Query { name: Name(“linkerd-dst-headless.linkerd.svc.cluster.local.”), query_type: AAAA, query_class: IN }, no record found for Query { name: Name(“linkerd-dst-headless.linkerd.svc.cluster.local.”), query_type: AAAA, query_class: IN }]

I believe it is this part of the values.yaml. " connectAddr: “1.1.1.1:20001” If I take that out I get " Failed to validate networking configuration. Please ensure iptables rules are rewritnig traffic as expected"

network validator configuration

This runs on a host that uses iptables to reroute network traffic. The validator

ensures that iptables is correctly routing requests before we start linkerd.

networkValidator:

– Log level for the network-validator

@default – debug

logLevel: debug

– Log format (plain or json) for network-validator

@default – plain

logFormat: plain

– Address to which the network-validator will attempt to connect. we expect this to be rewritten

#usermod# trying to reach external DNS
connectAddr: “1.1.1.1:20001”

– Address to which network-validator listens to requests from itself

listenAddr: “0.0.0.0:4140”

– Timeout before network-validator fails to validate the pod’s network connectivity

timeout: “10s”

I was able to get it working. The helm install method did not work offline for some reason I had to use the linkerd install to generate the yaml and then modify that file. Thanks all for the assistance.

Additionally I was also using provided files from a different release. I manually helm pulled the files and started again from scratch and was able to offline install via helm. So I have both Helm / kubectl install working offline, thanks again for the assistance.