Re: [PATCH] tpm-buf: memory-safe allocations

From: Jarkko Sakkinen

Date: Fri May 29 2026 - 18:35:36 EST

On Fri, May 29, 2026 at 10:08:52AM -0400, James Bottomley wrote:
> On Tue, 2026-05-26 at 10:53 +0300, Jarkko Sakkinen wrote:
> > On Mon, May 25, 2026 at 01:50:51PM -0400, James Bottomley wrote:
> > > On Fri, 2026-05-22 at 04:35 +0300, Jarkko Sakkinen wrote:
> > > > Decouple kzalloc from buffer creation, so that a managed
> > > > allocation
> > > > can be
> > > > used:
> > > >
> > > > struct tpm_buf *buf __free(kfree) buf =
> > > > kzalloc(TPM_BUFSIZE,
> > > > GFP_KERNEL);
> > > > if (!buf)
> > > > return -ENOMEM;
> > > >
> > > > tpm_buf_init(buf, TPM_BUFSIZE);
> > > >
> > > > Alternatively, stack allocations are also possible:
> > > >
> > > > u8 buf_data[512];
> > > > struct tpm_buf *buf = (struct tpm_buf *)buf_data;
> > > > tpm_buf_init(buf, sizeof(buf_data));
> > >
> > > This isn't really a good idea from a security point of view.
> > > Remember the buffer has to be big enough for both the sent and the
> > > received data. Today we simply set TPM_BUFSIZE to the maximum
> > > amount a TPM requires and all the send and receives just work. If
> > > we let callers set this size, we're asking for them to get it wrong
> > > (or at least forget about the receive part) and for us to get a DMA
> > > overrun from the TPM ... which might be potentially exploitable
> > > depending on how it occurs (think of an unseal of user chosen data
> > > overrunning).
> >
> > It's one patch so you're free to remark the call sites where this
> > happens. This is not a majorn concern at all.
>
> Nearly twenty years ago, when the kernel was a lot smaller, a then
> kernel luminary called Rusty Russell realized we needed to pay much
> more attention to how we design APIs inside the kernel if we wanted it
> to grow successfully. He published his initial thoughts and gave talks
> at both the kernel summit and OLS on it:
>
> https://ozlabs.org/~rusty/index.cgi/tech/2008-03-18.html
>
> The key point that's always stuck with me is "hard to misuse beats easy
> to use". Later he came up with a rating scale (now known as the Rusty
> API classification):
>
> https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
>
> and for chuckles and grins on April fools day he came up with a
> negative rating ridiculing some of our dafter API choices:
>
> https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>
> The point for this patch set is that the sizing of the original tpm_buf
> interface scores 10/10 on the Rusty scale (it's impossible to get
> wrong). Simply threading size through the whole API, as this patch
> does, may look like the right answer, but it causes a massive reduction
> in API score. In fact, since the buffer has to be sized not only
> according to what goes in, but also what gets returned and this is
> nowhere mentioned in the new documentation it scores -3 (read the
> documentation and you can still get it wrong). Now by mentioning the
> sizing problems in the doc, you can probably get it up to +3 (read the
> documentation and you'll get it right) but my question was not if you
> got it wrong somewhere in the patch but whether we couldn't do a whole
> lot better in terms of API score by designing a better API.
>
> A key point about the 185 version of the TPM spec is that it's really
> only a few commands that need larger buffers (the Post Quantum ML-KEM
> keys) which doesn't apply to most of the in-kernel TPM callsites.
> Since tpm_buf_init takes the ordinal, we can actually tell at runtime
> (or compile time if the ordinal is a constant) if the command would
> need a larger buffer. We can also tell from the TPM properties whether
> the TPM itself can take a larger buffer, so for every current TPM we
> could retain the original score 10/10 API and warn at runtime if there
> might be a problem. Then the larger keys seem to fit into 8k, so we
> could still retain most of the original API properties of being
> difficult to misuse simply by having an 8k size flag (which we could
> ignore if the TPM doesn't support it) and warn at runtime if
> tpm_buf_init sends an ordinal which might need a larger buffer. At
> worst we should be able to get to an API which scores 5/10 (do it right
> or it will break at runtime).
>
> Regards,
>
> James

This patch has pre-existed long before any of this post-quantum stuff,
and there are good reasons so to have buffers managed given e.g.,
complexity of tpm2-sessions code. It prevents any major risk for
memory leaks.

Trenchboot extends the use buffers to early boot and we want a robust
structure. I'm not going to spend my time reading about philosophical
aspects of API design. There are quantitative reasons to decrease
the risk of memory leaks.

BR, Jarkko