We are using the grafana-agent
to handle incoming traces, potentially do things to them, and then forward them on to Grafana Tempo. The Grafana agent is basically running the Open Telemtry Collector under the hood.
This setup is working for all of our applications, but linkerd is not sending traces and I assume its some misconfiguration on my end.
There is a service, listening for OTLP HTTP and GRPC traffic
kubectl get svc -n grafana-agent
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana-agent ClusterIP 172.20.18.93 <none> 80/TCP,4317/TCP,4318/TCP 27h
The configuration of this OTLP collector, ends in sending the traces to Tempo via GRPC
otelcol.exporter.otlp "to_tempo" {
client {
endpoint = "tempo-distributed-distributor.tempo.svc.cluster.local:4317"
tls {
insecure = true
insecure_skip_verify = true
}
}
}
This is working for our applications, I can view traces in Grafana etc.
nginx-ingress for example, it defaults otlp-collector-port
to 4317
enable-opentelemetry: "true"
otlp-collector-host: "grafana-agent.grafana-agent.svc.cluster.local"
I have linkerd-jaeger
deployed via the helm chart, like this, ive tried both 4318
and 4317
, neither appear to work.
collector:
enabled: false
jaeger:
enabled: false
webhook:
externalSecret: true
injectCaFrom: linkerd-jaeger/jaeger-injector
collectorSvcAddr: grafana-agent.grafana-agent:4317
collectorSvcAccount: grafana-agent
I can see this is configured on my linkerd-proxy
LINKERD2_PROXY_TRACE_COLLECTOR_SVC_ADDR: grafana-agent.grafana-agent:4317
LINKERD2_PROXY_TRACE_COLLECTOR_SVC_NAME: grafana-agent.grafana-agent.serviceaccount.identity.linkerd.cluster.local
I see errors on the linkerd-proxy
that would be sending a trace
{"timestamp":"[ 1351.438936s]","level":"WARN","fields":{"message":"Service failed","error":"endpoint 172.20.18.93:4317: channel closed"},"target":"linkerd_reconnect","spans":[{"name":"opencensus"},{"addr":"grafana-agent.grafana-agent.svc.cluster.local:4317","name":"controller"},{"addr":"172.20.18.93:4317","name":"endpoint"}],"threadId":"ThreadId(1)"}
{"timestamp":"[ 1394.200939s]","level":"WARN","fields":{"message":"Service failed","error":"endpoint 172.20.18.93:4317: channel closed"},"target":"linkerd_reconnect","spans":[{"name":"opencensus"},{"addr":"grafana-agent.grafana-agent.svc.cluster.local:4317","name":"controller"},{"addr":"172.20.18.93:4317","name":"endpoint"}],"threadId":"ThreadId(1)"}
{"timestamp":"[ 1416.017672s]","level":"WARN","fields":{"message":"Service failed","error":"endpoint 172.20.18.93:4317: channel closed"},"target":"linkerd_reconnect","spans":[{"name":"opencensus"},{"addr":"grafana-agent.grafana-agent.svc.cluster.local:4317","name":"controller"},{"addr":"172.20.18.93:4317","name":"endpoint"}],"threadId":"ThreadId(1)"}
{"timestamp":"[ 1502.128800s]","level":"WARN","fields":{"message":"Service failed","error":"endpoint 172.20.18.93:4317: channel closed"},"target":"linkerd_reconnect","spans":[{"name":"opencensus"},{"addr":"grafana-agent.grafana-agent.svc.cluster.local:4317","name":"controller"},{"addr":"172.20.18.93:4317","name":"endpoint"}],"threadId":"ThreadId(1)"}