Can't get Retries to work

notturingtested · April 25, 2024, 2:58am

Hi everyone, I am new to LinkerD and I have a question about retries. I have configured a ServiceProfile:

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  creationTimestamp: null
  name: report.test-linkerd.svc.cluster.local
  namespace: test-linkerd
spec:
  routes:
    - name: All GET Requests
      condition:
        method: GET
        pathRegex: ".*"
      isRetryable: true
    - name: All POST Requests
      condition:
        method: POST
        pathRegex: ".*"
      isRetryable: true
  retryBudget:
    retryRatio: 0.2
    minRetriesPerSecond: 10
    ttl: 10s

I have a service profile that has two routes, one for GET and one for POST. I have a retry budget set for the service profile.
When I try to make a condition for a retry (killing the report pod), I see that the retries are not happening.
I started to investigate,

➜  ~ linkerd viz routes --to deploy/report -n test-linkerd -o wide deploy/api-gateway
ROUTE                        SERVICE   EFFECTIVE_SUCCESS   EFFECTIVE_RPS   ACTUAL_SUCCESS   ACTUAL_RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
All GET Requests    portfolio-report                   -               -                -            -             -             -             -
All POST Requests   portfolio-report                   -               -                -            -             -             -             -
[DEFAULT]           portfolio-report                   -               -                -            -             -             -             -

And I see that the routes are not being matched. I have checked the logs of the api-gateway and I see that the requests are being made to the report service.
Then I ran the following command to check the metrics of the report service:

➜ ~ linkerd diagnostics proxy-metrics -n test-linkerd deploy/report | grep route_response_total # HELP route_response_total Total count of HTTP responses. # TYPE route_response_total counter route_response_total{direction="inbound",dst="report.test-linkerd.svc.cluster.local:80",rt_route="All POST Requests",status_code="200",classification="success",grpc_status="",error=""} 5 route_response_total{direction="inbound",dst="report.test-linkerd.svc.cluster.local:8090",rt_route="All GET Requests",status_code="200",classification="success",grpc_status="",error=""} 9 route_response_total{direction="inbound",dst="report.test-linkerd.svc.cluster.local:8090",rt_route="All GET Requests",status_code="304",classification="success",grpc_status="",error=""} 44

And I see that the report service is receiving requests with the routes All GET Requests and All POST Requests.
Additionally, in the dashboard, I do not see any requests from the api-gateway to the report service. They do show as “meshed”, but there is no green bar or metrics.
The only other clue I have found is the logs of the linkerd proxy container:

{"timestamp":"[  1355.773887s]","level":"INFO","fields":{"message":"Connection closed","error":"connection closed before message completed","client.addr":"10.62.71.212:35084","server.addr":"10.62.68.10:8080"},"target":"linkerd_app_core::serve","spans":[{"name":"inbound"}],"threadId":"ThreadId(1)"}

Am I missing something in the configuration of the service profile?
Am I misunderstanding how retries work in LinkerD?
Any help would be appreciated. Thanks!
Additional Info:

Client version: stable-2.14.10
Server version: stable-2.14.10

linkerd check and linkerd viz check come back all good.

Alex · May 9, 2024, 1:15am

Hi!

Retries always happen on the client side, so you’ll want to take a close look at whatever is sending requests to the report service. Is it the api-gateway you mentioned? Is it meshed? Is it sending on a port which is skipped by the proxy? Can you look at the proxy metrics of the client pod?

Topic		Replies	Views
Service per-route Route Metrics numbers do not show up Linkerd General Discussion metrics	5	593	May 27, 2023
I want Help Setting Up a Linkerd Service Mesh Linkerd General Discussion	1	57	January 25, 2025
Linkerd Proxy Restart Causes Brief Traffic Disruptions Linkerd General Discussion	0	30	February 21, 2025
2.13.0 - Request failed error = no route found for request Linkerd General Discussion	5	772	April 21, 2023
Destination failed to send profile update Linkerd General Discussion proxy , configuration	1	54	January 2, 2025

Can't get Retries to work

Related topics