LinkerD seems to swallow certain requests, left hanging on frontend + tap

Hey all,

I’m also x-posting this from the LinkerD Slack:

got a bit of a strange issue occurring that looks like Linkerd is swallowing some requests, and not actually erroring out the call so we just sit hanging and the call never completes from our frontend and linkerd tap.

  • We are running on version 25.4.4 (deployed via helm chart, and did try upgrading to 25.11.1 with no luck
  • Sending a request from our frontend, which makes a call to a backend service within K8s. We can see via the linkerd-viz tap section that the call is being passed to the correct pod (via Header HTTP Routing) however it never actually gives us any status on the call, and never looks like it makes it to the pod? Heres a screenshot from tap
  • We put the destination pods in debug log level, and managed to get this out of it when the call is made, and we see it being cancelled by linkerd, but nothing else happens and the call just hangs forever
  • Here is also the request details gathered from tap in the first column of the pop-up (again sanitised):

Hey @KrisM4c! I think we need to know a bit more about what’s going on exactly. Is your frontend another meshed workload? Are there authorization policies, HTTPRoutes, etc., in play? Do I correctly understand that your frontend workload is making an HTTP call that just hangs forever? Does it ever time out? What exactly does the frontend see?

As it stands we don’t really have enough to figure out where to look, so it’d be great to get more information. :slightly_smiling_face:

Hey @Flynn ,

No worries at all! Please find the extra information below:

  • The Frontend is not part of the meshed workloads (running externally and serving SPAs, so all traffic is from clients browser)
    • Traffic is send from clients browser to meshed backend
    • User (browser) > Traefik (meshed - ingress) > service-a (meshed - proxy)
  • HTTPRoutes are in place for Header-Based Routing (If header x-branch-name exists, send to service-a-xxx, else send to service-a-primary)
  • It never times out, and hangs forever until the user cancels the call by stopping the page loading.
  • All the frontend (user) sees is the page forever loading, and the network tab in the browser staying in a pending state until page load is manually stopped.

Heya @Flynn , just wondering if you have had a chance to see this at all? :slight_smile:

Sorry for the delay! I was completely offline for the holidays.

The destination controller is responsible for figuring out which endpoints should be used for a given connection; it’s distinct from the proxy handling the connection. Can you get the debug logs for the proxy in the traefik container? That’s probably a good next step. Is your traefik is configured with the Linkerd middleware? (cf Handling ingress traffic | Linkerd )

Good morning @Flynn ,

No worries at all, i thought that might’ve been the case, i hope you had a good time off!

So we do currently have Traefik setup with the linkerd.io/inject: ingress annotation. I have a call book for 13:00 GMT to ge the rest of these logs and ill get these sent over once gathered :slight_smile:

Cheers,

Kris

Hi @Flynn

Please see the attached logs (here). For reference, the call that we are intreseted in is:

GET > /api/v1/auth/refresh
Header: x-branch-name: so-28007
Service: be-api-gateway-portal (Not v2)

Please let me know if you need anything further!