No data from `linkerd viz routes` in booksapp demo

When following along with the booksapp demo, I don’t get any metrics from

» linkerd viz -n booksapp routes svc/webapp
ROUTE                       SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
GET /                        webapp         -     -             -             -             -
GET /authors/{id}            webapp         -     -             -             -             -
GET /books/{id}              webapp         -     -             -             -             -
POST /authors                webapp         -     -             -             -             -
POST /authors/{id}/delete    webapp         -     -             -             -             -
POST /authors/{id}/edit      webapp         -     -             -             -             -
POST /books                  webapp         -     -             -             -             -
POST /books/{id}/delete      webapp         -     -             -             -             -
POST /books/{id}/edit        webapp         -     -             -             -             -
[DEFAULT]                    webapp         -     -             -             -             -

Any ideas on where to look to start troubleshooting this? linkerd check doesn’t report any problems. I’m on stable-2.13.3 for all the components (control plane, proxies, and local client).

I do see an error in the tap deployment when I try to view routes in the dashboard.

time="2023-05-16T16:39:09Z" level=info msg="Tapping 3 pods for target: \"namespace:\\\"booksapp\\\" type:\\\"deployment\\\" name:\\\"webapp\\\"\""
time="2023-05-16T16:39:09Z" level=info msg="Establishing tap on 10.200.31.242:4190"
time="2023-05-16T16:39:09Z" level=info msg="Establishing tap on 10.200.42.95:4190"
time="2023-05-16T16:39:09Z" level=info msg="Establishing tap on 10.200.110.1:4190"
time="2023-05-16T16:39:12Z" level=error msg="[10.200.110.1] encountered an error: rpc error: code = Canceled desc = context canceled"
time="2023-05-16T16:39:12Z" level=error msg="[10.200.31.242] encountered an error: rpc error: code = Canceled desc = context canceled"
time="2023-05-16T16:39:12Z" level=error msg="[10.200.42.95] encountered an error: rpc error: code = Canceled desc = context canceled"

But I don’t get those errors when using the CLI. Just no data in the output.

I see no errors at all in the metrics-api pod

When I run

linkerd viz tap deploy/webapp -o wide | grep req

rt_route is missing from all the responses. The troubleshooting docs say

Getting regexes to match can be tough and the ordering is important. Pay attention to rt_route. If it is missing entirely, compare the :path to the regex you'd like for it to match, and use a tester with the Golang flavor of regex."

But as I said, I’m using the serviceprofile generated from the demo… so I have no idea what’s wrong here.

I tried simplifying the serviceprofile so it only has

spec:
  routes:
    - condition:
        method: POST
        pathRegex: /authors
      name: POST /authors
    - condition:
        method: POST
        pathRegex: /books
      name: POST /books

but I’m still getting nothing, and no rt_route in the taps

Hi @johnj

Here’s a few things you can try:

The command:
linkerd diagnostics proxy-metrics -n booksapp deploy/webapp | grep route_response_total
will show metrics directly from a Linkerd proxy. You should see a route_response_total metric and it should have the rt_route label. But since rt_route isn’t showing up in tap for you, I suspect it probably won’t show up here either.

The next thing I’d check is the name of the ServiceProfiles and ensure they match the name of the Serivces:

> kubectl -n booksapp get serviceprofiles
NAME                                 AGE
authors.booksapp.svc.cluster.local   3m41s
books.booksapp.svc.cluster.local     3m35s
webapp.booksapp.svc.cluster.local    3m59s
> kubectl -n booksapp get svc            
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
authors   ClusterIP   10.96.87.222    <none>        7001/TCP   4m50s
books     ClusterIP   10.96.211.148   <none>        7002/TCP   4m50s
webapp    ClusterIP   10.96.249.55    <none>        7000/TCP   4m50s

In particular, ensuring that the names of the ServiceProfiles include the correct namespace and cluster domain. If you have a cluster domain other than cluster.local, you will need to change the names of the ServiceProfile resources given in the demo.

My k8s cluster is standard AWS EKS, nothing unusual going on. It uses the cluster.local domain.

Names of things are what you would expect.

» k get sp
NAME                                 AGE
authors.booksapp.svc.cluster.local   11m
books.booksapp.svc.cluster.local     11m
webapp.booksapp.svc.cluster.local    11m

» k get svc
NAME      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
authors   ClusterIP   172.20.245.197   <none>        7001/TCP   12m
books     ClusterIP   172.20.106.74    <none>        7002/TCP   12m
webapp    ClusterIP   172.20.231.141   <none>        7000/TCP   12m

Since I posted this, I’ve deleted the booksapp namespace and recreated everything from scratch, and now I have even stranger behavior. The routes for the webapp service are working, but the others aren’t.

» linkerd viz -n booksapp routes svc/webapp
ROUTE                       SERVICE   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
GET /                        webapp   100.00%   0.6rps          12ms          19ms          20ms
GET /authors/{id}            webapp   100.00%   0.6rps           8ms          10ms          10ms
GET /books/{id}              webapp   100.00%   1.3rps           8ms          10ms          10ms
POST /authors                webapp   100.00%   0.6rps           8ms          16ms          19ms
POST /authors/{id}/delete    webapp   100.00%   0.7rps          18ms          29ms          30ms
POST /authors/{id}/edit      webapp         -        -             -             -             -
POST /books                  webapp    46.75%   2.8rps          10ms          19ms          20ms
POST /books/{id}/delete      webapp   100.00%   0.7rps           8ms          10ms          10ms
POST /books/{id}/edit        webapp    52.94%   1.1rps          73ms          97ms          99ms
[DEFAULT]                    webapp         -        -             -             -             -

» linkerd viz -n booksapp routes svc/authors
ROUTE                       SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
DELETE /authors/{id}.json   authors         -     -             -             -             -
GET /authors.json           authors         -     -             -             -             -
GET /authors/{id}.json      authors         -     -             -             -             -
HEAD /authors/{id}.json     authors         -     -             -             -             -
POST /authors.json          authors         -     -             -             -             -
[DEFAULT]                   authors         -     -             -             -             -

» linkerd viz -n booksapp routes svc/books
ROUTE                     SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
DELETE /books/{id}.json     books         -     -             -             -             -
GET /books.json             books         -     -             -             -             -
GET /books/{id}.json        books         -     -             -             -             -
POST /books.json            books         -     -             -             -             -
PUT /books/{id}.json        books         -     -             -             -             -
[DEFAULT]                   books         -     -             -             -             -

In addition, the routes for the linkerd-viz metrics-api aren’t working either.

» linkerd viz -n linkerd-viz routes svc/metrics-api
ROUTE                           SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
POST /api/v1/Edges          metrics-api         -     -             -             -             -
POST /api/v1/Gateways       metrics-api         -     -             -             -             -
POST /api/v1/ListPods       metrics-api         -     -             -             -             -
POST /api/v1/ListServices   metrics-api         -     -             -             -             -
POST /api/v1/SelfCheck      metrics-api         -     -             -             -             -
POST /api/v1/StatSummary    metrics-api         -     -             -             -             -
POST /api/v1/TopRoutes      metrics-api         -     -             -             -             -
[DEFAULT]                   metrics-api         -     -             -             -             -

Strange indeed. I’d recommend looking at the proxy metrics using the diagnostics command above as well as looking for these metrics in Prometheus to see if they have been successfully scraped.

It appears this is a regression on route metrics. I’ve reopened this issue to track it down.

I appreciate it! I’ll keep an eye on that issue.

Has there been any movement on this? I see the issue is locked in Github, but I don’t see any comments about it or links to any PR.

Hi @johnj

The issue got locked when it was closed but didn’t get unlocked when it was reopened. I’ve unlocked the issue now.

I still have this problem after upgrading to 2.13.4. Even looking at linkerd’s own components, I see no data on the Route Metrics tab, in the linkerd-viz/metrics-api deployment, for example. Live Calls seems to be working fine, but I never see anything show up under Route Metrics.

Route metrics not working seems to be a pretty major thing, and the fact that I don’t see anyone else talking about it makes me think that maybe there’s something wrong with my configuration. But… I’m not doing anything unusual. I installed linkerd and linkerd-viz with the helm charts. I’m running on vanilla AWS EKS, k8s 1.27, using stock AMIs for nodes, using the default VPC CNI.

@alpeb you said you think this could be a regression. Do you have any more info about this?

A description of the current behavior is available here: `route_response_latency_ms` histogram inbound metrics no longer exposed · Issue #10521 · linkerd/linkerd2 · GitHub

This still isn’t working in 2.13.6. Is there any ETA for a fix? I’d really like to be able to use route metrics at some point.

Can you confirm what the steps in this comment return for you? (please replace2.13.1 with 2.13.6 there, and give a couple of minutes after rolling out the deployments for the stats to kick in)

I get no output from linkerd viz tap -n my-ns deploy/my-app -ojson | grep -A 1 l5d-dst-canonical

I used linkerd viz profile to generate a really simple serviceprofile for my service. It’s entirely possible I’m doing something wrong but if so I can’t see it.

I tried restarting the deployment, more than once actually. It didn’t help.

My service profile looks like this:

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: my-app.my-ns.svc.cluster.local
  namespace: my-ns
spec:
  routes:
  - condition:
      method: GET
      pathRegex: /healthz
    name: GET /healthz
  - condition:
      method: GET
      pathRegex: /metrics
    name: GET /metrics

Note that the client pod making requests to your my-app service needs to be meshed, and bounced once the ServiceProfile is in place. The /healthz endpoint is usually hit from the kubelet, which cannot be meshed. In any case, I think it’d be easier for us to stick to the booksapp example so we don’t introduce more unknowns.

Note that the client pod making requests to your my-app service needs to be meshed

Ohhhh that’s a very important piece of info. If that is in the docs anywhere, I missed it. Thanks for that, I’ll try setting up a test with requests from meshed services instead of the health and metrics endpoints.