HTTPRoute not working on subsequent calls

Hey all!

Im having a few issues with getting the Header Based Routing/Dynamic Routing based on headers working as expected. Please note the following has all been sensitised :slight_smile:

My plan is to have a full deployment of an environment, and on-demand environments running side-by-side. By on-demand, it will be if a single microserivce out of a list changes, only that single one is deployed, but with a suffix on all Kubernetes resources. We are doing this all through helm. This would include:

  • Deployment
  • Service
  • HTTPRoute

For the values, if the following is provided, the full name of the app changes.
_helpers.tpl

{{- define "app-name.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- if .Values.global.ondemand.enabled }}
{{- printf "%s-%s" .Values.fullnameOverride .Values.global.ondemand.branchName | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}

And for the values.yaml

global:
  environmentName: ondemand

  ondemand:
    enabled: true
    branchName: "ond-1"

So the full deployment of app-name.fullname will be app-a, whereas with on-demand, it will be app-a-ond-1.

For the HTTPRoute, we have templated it like so:

{{- if .Values.global.ondemand.enabled }}
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: {{ include "app-name.fullname" . }}
  labels:
    {{- include "app-name.labels" . | nindent 4 }}
spec:
  parentRefs:
    - name: {{ .Values.fullnameOverride }}-headless
      kind: Service
      group: core
      port: {{ .Values.service.port }}
  rules:
    - matches:
      - headers:
        - name: "x-branch-name"
          value: "{{ .Values.global.ondemand.branchName }}"
      backendRefs:
        - name: {{ include "app-name.fullname" . }}-headless
          port: {{ .Values.service.port }}
    - backendRefs:
      - name: {{ .Values.fullnameOverride }}-headless
        port: {{ .Values.service.port }}
{{- end -}}

This would mean when an on-demand app is deployed, the spec will be (and what deploys to the cluster):

apiVersion: policy.linkerd.io/v1beta3
kind: HTTPRoute
metadata:
  name: app-name-ond-1
  namespace: staging-nova
spec:
  parentRefs:
  - group: core
    kind: Service
    name: app-name-headless
    port: 8080
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: app-name-ond-1-headless
      port: 8080
      weight: 1
    matches:
    - headers:
      - name: x-branch-name
        type: Exact
        value: ond-1
      path:
        type: PathPrefix
        value: /
  - backendRefs:
    - group: ""
      kind: Service
      name: app-name-headless
      port: 8080
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /

The way our traffic flow is on a full deployment:

TRAEFIK-INGRESS > API-GATEWAY > APP-A

We expect that if api-gateway-ond-1 and app-a-ond-1 exists (with associated HTTPRoutes), providing the header x-branch-name: ond-1 will route us to api-gateway-ond-1-headless and then the subsequent call to app-a-ond-1-headless.

However what seems to happen is:

TRAEFIK-INGRESS > API-GATEWAY-OND-1  > APP-A

All subsequent calls seem to fail to route correctly with the HTTPRoute in place. The applications themselves are passing on the x-branch-name header, as it exists in the request call being made from api-gateway-ond-1 within linkerd-viz tap.

The HTTPRoutes themselves look fine so not sure what’s going on (example status field from HTTPRoute):

status:
  parents:
  - conditions:
    - lastTransitionTime: "2023-09-27T15:07:51Z"
      message: ""
      reason: NoMatchingParent
      status: "False"
      type: Accepted
    - lastTransitionTime: "2023-09-27T15:07:51Z"
      message: ""
      reason: ResolvedRefs
      status: "True"
      type: ResolvedRefs
    controllerName: linkerd.io/policy-controller
    parentRef:
      group: core
      kind: Service
      name: app-a-ond-1-headless
      namespace: demand

Hopefully someone has seen this issue before/can help out on where this may be going wrong :smiley:

Cheers!
Kris

Hey Kris! Can you show the definition of the app-name-ond-1-headless Service too?

Thanks!

Heya Flynn,

Here is the helm template:

apiVersion: v1
kind: Service
metadata:
  name: {{ include "app-name.fullname" . }}-headless
  labels:
    {{- include "app-name.labels" . | nindent 4 }}
spec:
  clusterIP: None
  ports:
    - port: {{ .Values.service.port }}
      targetPort: http
      protocol: TCP
      name: http
  selector:
    {{- include "app-name.selectorLabels" . | nindent 4 }}

And the object for the full env that is deployed into the cluster:

apiVersion: v1
kind: Service
metadata:
  name: app-a-headless
  namespace: staging-nova
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: app-a
    app.kubernetes.io/instance: app-a
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

And of the on-demand service:

apiVersion: v1
kind: Service
metadata:
  name: app-a-ond-1-headless
  namespace: staging-nova
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/instance: app-a-ond-1
    app.kubernetes.io/name: app-a
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Just as a further update from this, I don’t think that the HTTPRoutes are working at all.

For example, from Traefik (meshed) to API-GATEWAY, even though the header 100% exists in the call (seen in Linkerd-viz) that matches the HTTPRoute, it still sends it to the incorrect application.

Happy to jump on a call to show everything, rather not post the screenshots here due to identifiable information :slight_smile:

Or I can post via a private DM within your community slack @Flynn if that makes it easier :smiley:

Can also post updates here (sensitised) for anyone else that experiences this issue in future!

Sooooo, just to give everyone a solution to the issues being seen here after a video call with Linkerd:

  • I am using Traefik 2. If you do not set the traefik deployment annotations to linkerd.io/inject: ingress, it WILL NOT pick up any HTTPRoutes being specified within the cluster
  • My HTTPRoutes included the use of headless services, which in the case of HTTPRoutes, it will not work as they need to talk to the ClusterIP. If there is no ClusterIP (as the headless service was set to None) it will not route said traffic correctly!
  • I was running with 3 replicas of the Linkerd control plane pods, which seemed to have a significant lag in picking up new k8s objects (HTTPRoutes in this case) and applying them to the mesh. After scaling down to a single replica of each, things were a lot better :smiley:

Hopefully I captured it all in there, but many thanks to @Flynn for their time on the call and managing to get this all resolved! <3

Sure!

To expand on this a bit:

  • Traefik 1 & 2 are the same: they both need linkerd.io/inject: ingress. However, it’s easy to miss this in the Linkerd & Ingress doc because clicking on traefik 2 at the top of the doc will skip you past the caution about that, so we’ll fix that.
  • Cluster IPs are actually required for all meshes that conform to the Gateway API for mesh right now (start with GAMMA - Kubernetes Gateway API if you want the gory details). I’m going to try to make this more clear in our docs too.

So the lag is the one that I think we’ll probably need to work together on to sort out later…

1 Like