No data from `linkerd viz routes` in booksapp demo

When following along with the booksapp demo, I don’t get any metrics from

» linkerd viz -n booksapp routes svc/webapp
ROUTE                       SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
GET /                        webapp         -     -             -             -             -
GET /authors/{id}            webapp         -     -             -             -             -
GET /books/{id}              webapp         -     -             -             -             -
POST /authors                webapp         -     -             -             -             -
POST /authors/{id}/delete    webapp         -     -             -             -             -
POST /authors/{id}/edit      webapp         -     -             -             -             -
POST /books                  webapp         -     -             -             -             -
POST /books/{id}/delete      webapp         -     -             -             -             -
POST /books/{id}/edit        webapp         -     -             -             -             -
[DEFAULT]                    webapp         -     -             -             -             -

Any ideas on where to look to start troubleshooting this? linkerd check doesn’t report any problems. I’m on stable-2.13.3 for all the components (control plane, proxies, and local client).

I do see an error in the tap deployment when I try to view routes in the dashboard.

time="2023-05-16T16:39:09Z" level=info msg="Tapping 3 pods for target: \"namespace:\\\"booksapp\\\" type:\\\"deployment\\\" name:\\\"webapp\\\"\""
time="2023-05-16T16:39:09Z" level=info msg="Establishing tap on 10.200.31.242:4190"
time="2023-05-16T16:39:09Z" level=info msg="Establishing tap on 10.200.42.95:4190"
time="2023-05-16T16:39:09Z" level=info msg="Establishing tap on 10.200.110.1:4190"
time="2023-05-16T16:39:12Z" level=error msg="[10.200.110.1] encountered an error: rpc error: code = Canceled desc = context canceled"
time="2023-05-16T16:39:12Z" level=error msg="[10.200.31.242] encountered an error: rpc error: code = Canceled desc = context canceled"
time="2023-05-16T16:39:12Z" level=error msg="[10.200.42.95] encountered an error: rpc error: code = Canceled desc = context canceled"

But I don’t get those errors when using the CLI. Just no data in the output.

I see no errors at all in the metrics-api pod

When I run

linkerd viz tap deploy/webapp -o wide | grep req

rt_route is missing from all the responses. The troubleshooting docs say

Getting regexes to match can be tough and the ordering is important. Pay attention to rt_route. If it is missing entirely, compare the :path to the regex you'd like for it to match, and use a tester with the Golang flavor of regex."

But as I said, I’m using the serviceprofile generated from the demo… so I have no idea what’s wrong here.

I tried simplifying the serviceprofile so it only has

spec:
  routes:
    - condition:
        method: POST
        pathRegex: /authors
      name: POST /authors
    - condition:
        method: POST
        pathRegex: /books
      name: POST /books

but I’m still getting nothing, and no rt_route in the taps

Hi @johnj

Here’s a few things you can try:

The command:
linkerd diagnostics proxy-metrics -n booksapp deploy/webapp | grep route_response_total
will show metrics directly from a Linkerd proxy. You should see a route_response_total metric and it should have the rt_route label. But since rt_route isn’t showing up in tap for you, I suspect it probably won’t show up here either.

The next thing I’d check is the name of the ServiceProfiles and ensure they match the name of the Serivces:

> kubectl -n booksapp get serviceprofiles
NAME                                 AGE
authors.booksapp.svc.cluster.local   3m41s
books.booksapp.svc.cluster.local     3m35s
webapp.booksapp.svc.cluster.local    3m59s
> kubectl -n booksapp get svc            
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
authors   ClusterIP   10.96.87.222    <none>        7001/TCP   4m50s
books     ClusterIP   10.96.211.148   <none>        7002/TCP   4m50s
webapp    ClusterIP   10.96.249.55    <none>        7000/TCP   4m50s

In particular, ensuring that the names of the ServiceProfiles include the correct namespace and cluster domain. If you have a cluster domain other than cluster.local, you will need to change the names of the ServiceProfile resources given in the demo.

My k8s cluster is standard AWS EKS, nothing unusual going on. It uses the cluster.local domain.

Names of things are what you would expect.

» k get sp
NAME                                 AGE
authors.booksapp.svc.cluster.local   11m
books.booksapp.svc.cluster.local     11m
webapp.booksapp.svc.cluster.local    11m

» k get svc
NAME      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
authors   ClusterIP   172.20.245.197   <none>        7001/TCP   12m
books     ClusterIP   172.20.106.74    <none>        7002/TCP   12m
webapp    ClusterIP   172.20.231.141   <none>        7000/TCP   12m

Since I posted this, I’ve deleted the booksapp namespace and recreated everything from scratch, and now I have even stranger behavior. The routes for the webapp service are working, but the others aren’t.

» linkerd viz -n booksapp routes svc/webapp
ROUTE                       SERVICE   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
GET /                        webapp   100.00%   0.6rps          12ms          19ms          20ms
GET /authors/{id}            webapp   100.00%   0.6rps           8ms          10ms          10ms
GET /books/{id}              webapp   100.00%   1.3rps           8ms          10ms          10ms
POST /authors                webapp   100.00%   0.6rps           8ms          16ms          19ms
POST /authors/{id}/delete    webapp   100.00%   0.7rps          18ms          29ms          30ms
POST /authors/{id}/edit      webapp         -        -             -             -             -
POST /books                  webapp    46.75%   2.8rps          10ms          19ms          20ms
POST /books/{id}/delete      webapp   100.00%   0.7rps           8ms          10ms          10ms
POST /books/{id}/edit        webapp    52.94%   1.1rps          73ms          97ms          99ms
[DEFAULT]                    webapp         -        -             -             -             -

» linkerd viz -n booksapp routes svc/authors
ROUTE                       SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
DELETE /authors/{id}.json   authors         -     -             -             -             -
GET /authors.json           authors         -     -             -             -             -
GET /authors/{id}.json      authors         -     -             -             -             -
HEAD /authors/{id}.json     authors         -     -             -             -             -
POST /authors.json          authors         -     -             -             -             -
[DEFAULT]                   authors         -     -             -             -             -

» linkerd viz -n booksapp routes svc/books
ROUTE                     SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
DELETE /books/{id}.json     books         -     -             -             -             -
GET /books.json             books         -     -             -             -             -
GET /books/{id}.json        books         -     -             -             -             -
POST /books.json            books         -     -             -             -             -
PUT /books/{id}.json        books         -     -             -             -             -
[DEFAULT]                   books         -     -             -             -             -

In addition, the routes for the linkerd-viz metrics-api aren’t working either.

» linkerd viz -n linkerd-viz routes svc/metrics-api
ROUTE                           SERVICE   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
POST /api/v1/Edges          metrics-api         -     -             -             -             -
POST /api/v1/Gateways       metrics-api         -     -             -             -             -
POST /api/v1/ListPods       metrics-api         -     -             -             -             -
POST /api/v1/ListServices   metrics-api         -     -             -             -             -
POST /api/v1/SelfCheck      metrics-api         -     -             -             -             -
POST /api/v1/StatSummary    metrics-api         -     -             -             -             -
POST /api/v1/TopRoutes      metrics-api         -     -             -             -             -
[DEFAULT]                   metrics-api         -     -             -             -             -

Strange indeed. I’d recommend looking at the proxy metrics using the diagnostics command above as well as looking for these metrics in Prometheus to see if they have been successfully scraped.

It appears this is a regression on route metrics. I’ve reopened this issue to track it down.

I appreciate it! I’ll keep an eye on that issue.

Has there been any movement on this? I see the issue is locked in Github, but I don’t see any comments about it or links to any PR.

Hi @johnj

The issue got locked when it was closed but didn’t get unlocked when it was reopened. I’ve unlocked the issue now.