Hello! I’m currently evaluating the use of linkerd in our EKS cluster. I stepped through a replay of your excellent workshop “Linkerd Certificate Management Deep Dive” and think I have a handle on our cert management strategy. However, I have some questions about the security and operational implications of longer vs shorter lifespans of intermediate CAs managed by cert-manager.
First: My understanding is that it’s best practice to rotate these certificates frequently so that exposed private keys are not usable for long. I’m curious what the risk of long-lived certs is in a concrete sense. What could a malicious actor do with the identity cert and private key? Would this malicious actor need to have gained access to our nodes to wreak havoc?
Second: Are you folks aware of any operational risks associated with cert-manager automatic renewals? Are there any implications with keeping them very short lived? Based on what I’m reading, it seems like it all depends on our org’s security policies/postures, but I’m wondering if there is a concrete tradeoff is here.
Any guidance on either of these points is very much appreciated!
The broad concern with long-lived certificates is that the longer they’re in use, the more opportunity an evildoer has to steal them. (This is often expressed as the axiom that the longer a key is in use, the more valuable it becomes.) Also note that you will usually not know when a compromise happens. You might know when it’s exploited, but then again you might not – this is another way in which minimizing the opportunity is important.
In concrete terms, once an evildoer has a certificate’s private key, they can assume the identity of the entity that owns the certificate. So, for example, if they steal a trust anchor’s private key, they can generate their own identity issuer certificate, rotate that in, and then do pretty much whatever they like with Linkerd’s mTLS: they could take over workloads, eavesdrop on mTLS communication, etc. (though they could not decrypt communications from before the certificate swap happened).
This sort of maliciousness generally does require access to the cluster, but not the Node. (If you have privileged access to the Node, you can do much worse things. )
I’m not personally aware of operational issues with cert-manager’s renewals for the identity issuer. For the trust anchor, you’ll ideally go in after you know that nothing is using the old trust anchor and remove its public key from the bundle. The major operational issue I know of for very short-lived certs is that linkerd check will complain if the expiry period for the identity issuer or trust anchor is less than 60 days.
I think that all makes sense. We are planning to keep the trust anchor key away from the cluster for just that reason. Let’s say that an evildoer with access to the cluster gets hold of the identity issuer cert and private key. Would this also allow a hypothetical evildoer with access to the cluster to take over workloads and eavesdrop on mTLS communication, for example?
Yes, from the point they get the identity issuer private key…
…at least theoretically. I should probably point out that actually mounting that attack is, uh, complex. Great reason to automate quickly rotating the identity issuer!