After upgrading linkerd-control-plane from version 1.9.3 (stable-2.12.1) to version 1.12.3 (stable-2.13.3), I started getting “503 service unavailable” responses when a large number of connections are used (~100 connections or higher).
When I remove the linkerd-proxy sidecar, everything works without any errors.
The network topology is a pod-to-pod HTTP communication. Both the client and the server pods have only one pod instance.
I am using fortio as a load-generation tool.
Please note that this issue started only after upgrading the linkerd-control-plane chart.
I configured the log level of the linkerd-proxy sidecar to debug
and found the following logs that I think may be relevant:
server-side:
DEBUG ThreadId(01) inbound:accept{client.addr=10.129.0.64:37346}:server{port=8080}:http: linkerd_proxy_http::server: The client is shutting down the connection res=Err(hyper::Error(Io, Custom { kind: NotConnected, error: "server: Transport endpoint is not connected (os error 107)" }))
DEBUG ThreadId(01) inbound:accept{client.addr=10.129.0.64:37346}: linkerd_app_core::serve: Connection closed reason=connection error: server: Transport endpoint is not connected (os error 107)
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_tls::server: Peeked bytes from TCP stream sz=0
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_tls::server: Attempting to buffer TLS ClientHello after incomplete peek
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_tls::server: Reading bytes from TCP stream buf.capacity=8192
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_tls::server: Read bytes from TCP stream buf.len=108
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_detect: Detected protocol protocol=Some(HTTP/1) elapsed=3.17µs
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_proxy_http::server: Creating HTTP service version=HTTP/1
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_proxy_http::server: Handling as HTTP version=HTTP/1
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}:http: linkerd_app_inbound::policy::http: Request authorized server.group= server.kind=default server.name=all-unauthenticated route.group= route.kind=default route.name=probe authz.group= authz.kind=default authz.name=probe client.tls=None(NoClientHello) client.ip=10.129.0.2
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}:http: linkerd_proxy_http::server: The client is shutting down the connection res=Ok(())
DEBUG ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:accept{client.addr=10.129.0.2:44476}: linkerd_app_core::serve: Connection closed
client-side:
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}: linkerd_detect: Detected protocol protocol=Some(HTTP/1) elapsed=4.51µs
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}: linkerd_proxy_http::server: Creating HTTP service version=HTTP/1
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}: linkerd_app_outbound::sidecar: Using ClientPolicy routes
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}: linkerd_proxy_http::server: Handling as HTTP version=HTTP/1
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}:http: linkerd_app_outbound::http::logical::policy::router: Selected route meta=RouteRef(Default { name: "http" })
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}:http: linkerd_stack::loadshed: Service has become unavailable
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}:http: linkerd_stack::loadshed: Service shedding load
INFO ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}:http:rescue{client.addr=10.129.0.64:41544}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=logical service 172.30.159.54:8080: service unavailable error.sources=[service unavailable]
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}:http: linkerd_app_core::errors::respond: Handling error on HTTP connection status=503 Service Unavailable version=HTTP/1.1 close=true
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}:proxy{addr=172.30.159.54:8080}:http: linkerd_proxy_http::server: The client is shutting down the connection res=Ok(())
DEBUG ThreadId(01) outbound:accept{client.addr=10.129.0.64:41544}: linkerd_app_core::serve: Connection closed
I’ll appreciate your assistance with this issue.