Re: [regression] significant delays when secureboot is enabled since 6.10

From: James Bottomley
Date: Thu Sep 12 2024 - 09:26:47 EST


On Thu, 2024-09-12 at 16:16 +0300, Jarkko Sakkinen wrote:
> On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote:
> > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote:
[...]
> > > I made few measurements. I have a Fedora 38 VM with TPM
> > > passthrough.
> > >
> > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host)
> > >
> > > QEMU:
> > >
> > > rc  qemu-kvm                                          1:4.2-
> > > 3ubuntu6.27
> > > ii  qemu-system-x86                                   1:6.2+dfsg-
> > > 2ubuntu6.22
> > >
> > >
> > > TPM2_PT_MANUFACTURER:
> > >   raw: 0x49465800
> > >   value: "IFX"
> > > TPM2_PT_VENDOR_STRING_1:
> > >   raw: 0x534C4239
> > >   value: "SLB9"
> > > TPM2_PT_VENDOR_STRING_2:
> > >   raw: 0x36373000
> > >   value: "670"
> > >
> > >
> > > No HMAC:
> > >
> > > # tracer: function_graph
> > > #
> > > # CPU  DURATION                  FUNCTION CALLS
> > > # |     |   |                     |   |   |   |
> > >  0)               |  tpm2_pcr_extend() {
> > >  0)   1.112 us    |    tpm_buf_append_hmac_session();
> > >  0) # 6360.029 us |    tpm_transmit_cmd();
> > >  0) # 6415.012 us |  }
> > >
> > >
> > > HMAC:
> > >
> > > # tracer: function_graph
> > > #
> > > # CPU  DURATION                  FUNCTION CALLS
> > > # |     |   |                     |   |   |   |
> > >  1)               |  tpm2_pcr_extend() {
> > >  1)               |    tpm2_start_auth_session() {
> > >  1) * 36976.99 us |      tpm_transmit_cmd();
> > >  1) * 84746.51 us |      tpm_transmit_cmd();
> > >  1) # 3195.083 us |      tpm_transmit_cmd();
> > >  1) @ 126795.1 us |    }
> > >  1)   2.254 us    |    tpm_buf_append_hmac_session();
> > >  1)   3.546 us    |    tpm_buf_fill_hmac_session();
> > >  1) * 24356.46 us |    tpm_transmit_cmd();
> > >  1)   3.496 us    |    tpm_buf_check_hmac_response();
> > >  1) @ 151171.0 us |  }
> >
> > Well, unfortunately, that tells us that it's the TPM itself that's
> > taking the time processing the security overhead.  The ordering of
> > the commands in tpm2_start_auth_session() shows
> >
> >  37ms for context restore of null key
> >  85ms for start session with encrypted salt
> >   3ms to flush null key
> > -----
> > 125ms
> >
> > If we context save the session, we'd likely only bear a single 37ms
> > cost to restore it (replacing the total 125ms).  However, there's
> > nothing we can do about the extend execution going from 6ms to
> > 24ms, so I could halve your current boot time with security enabled
> > (it's currently 149ms, it would go to 61ms, but it's still 10x
> > slower than the unsecured extend at 6ms)
> >
> > James
>
> I'll hold for better benchmarks.

Well, yes, I'd like to see this for a variety of TPMs.

This one clearly shows it's the real time wait for the TPM (since it
dwarfs the CPU time calculation there's not much optimization we can do
on the kernel end). The one thing that's missing in all of this is
what was the TPM? but even if it's an outlier that's really bad at
crypto what should we do? We could have a blacklist that turns off the
extend hmac (or a whitelist that turns it on), but we can't simply say
too bad you need a better TPM.

James