Re: [regression] significant delays when secureboot is enabled since 6.10

From: Jarkko Sakkinen
Date: Thu Sep 12 2024 - 09:16:44 EST


On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote:
> On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote:
> > On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote:
> > > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote:
> > > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote:
> > > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote:
> > > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote:
> > > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression
> > > > > > > tracking
> > > > > > > (Thorsten
> > > > > > > Leemhuis) wrote:
> > > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker.
> > > > > > > >
> > > > > > > > James, Jarkoo, I noticed a report about a regression in
> > > > > > > > bugzilla.kernel.org that appears to be caused by this
> > > > > > > > change of
> > > > > > > > yours:
> > > > > > > >
> > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to
> > > > > > > > tpm2_pcr_extend()")
> > > > > > > > [v6.10-rc1]
> > > > > > > >
> > > > > > > > As many (most?) kernel developers don't keep an eye on
> > > > > > > > the bug
> > > > > > > > tracker,
> > > > > > > > I decided to forward it by mail. To quote from
> > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 :
> > > > > > > >
> > > > > > > > > When secureboot is enabled,
> > > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel.
> > > > > > > > > it's ~7 seconds on 6.8 kernel version.
> > > > > > > > >
> > > > > > > > > When secureboot is disabled,
> > > > > > > > > the boot time is ~7 seconds too.
> > > > > > > > >
> > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad
> > > > > > > > > X1 and
> > > > > > > > > T14.
> > > > > > > > >
> > > > > > > > > It probably caused autologin failure and micmute led
> > > > > > > > > not
> > > > > > > > > loaded on AMD platform.
> > > > > > > >
> > > > > > > > It was later bisected to the change mentioned above. See
> > > > > > > > the
> > > > > > > > ticket for
> > > > > > > > more details.
> > > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > I suspect I encountered the same problem:
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@xxxxxxxxxxxxxxx/
> > > > > > >
> > > > > > > Going to provide more info there.
> > > > > >
> > > > > > I suppose you are going try to acquire the tracing data I
> > > > > > asked?
> > > > > > That would be awesome, thanks for taking the troube.  Let's
> > > > > > look
> > > > > > at the data and draw conclusions based on that.
> > > > > >
> > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the
> > > > > > kernel
> > > > > > configuration disables the feature.
> > > > > >
> > > > > > For making decisions what to do with the  we are talking
> > > > > > about ~2
> > > > > > week window estimated, given the Vienna conference slows
> > > > > > things
> > > > > > down, so I hope my workaround is good enough before that.
> > > > >
> > > > > I can enumerate three most likely ways to address the issue:
> > > > >
> > > > > 1. Strongest: drop from defconfig.
> > > > > 2. Medium: leave to defconfig but add an opt-in kernel command-
> > > > > line
> > > > >    parameter.
> > > > > 3. Lightest: if we can based on tracing data nail the
> > > > > regression in
> > > > >    sustainable schedule, fix it.
> > > >
> > > > Actually, there's a fourth: not use sessions for the PCR extend
> > > > (if
> > > > we'd got the timings when I asked, this was going to be my
> > > > suggestion
> > > > if they came back problematic).  This seems only to be a problem
> > > > for
> > > > IMA measured boot (because it does lots of extends).  If
> > > > necessary this
> > > > could even be wrapped in a separate config or boot option that
> > > > only
> > > > disables HMAC on extend if IMA (so we still get security for
> > > > things
> > > > like sd-boot)
> > >
> > > I can buy that but with a twist that make it an opt-in kernel
> > > command
> > > line option. We don't want to take already existing functionality
> > > away
> > > from those who might want to use it (given e.g. hardening
> > > requirements),
> > > and with that basis opt-in (by default disabled) would be more
> > > balanced
> > > way to address the issue.
> > >
> > > Please do a send a patch!
> >
> > I made few measurements. I have a Fedora 38 VM with TPM passthrough.
> >
> > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host)
> >
> > QEMU:
> >
> > rc  qemu-kvm                                          1:4.2-
> > 3ubuntu6.27
> > ii  qemu-system-x86                                   1:6.2+dfsg-
> > 2ubuntu6.22
> >
> >
> > TPM2_PT_MANUFACTURER:
> >   raw: 0x49465800
> >   value: "IFX"
> > TPM2_PT_VENDOR_STRING_1:
> >   raw: 0x534C4239
> >   value: "SLB9"
> > TPM2_PT_VENDOR_STRING_2:
> >   raw: 0x36373000
> >   value: "670"
> >
> >
> > No HMAC:
> >
> > # tracer: function_graph
> > #
> > # CPU  DURATION                  FUNCTION CALLS
> > # |     |   |                     |   |   |   |
> >  0)               |  tpm2_pcr_extend() {
> >  0)   1.112 us    |    tpm_buf_append_hmac_session();
> >  0) # 6360.029 us |    tpm_transmit_cmd();
> >  0) # 6415.012 us |  }
> >
> >
> > HMAC:
> >
> > # tracer: function_graph
> > #
> > # CPU  DURATION                  FUNCTION CALLS
> > # |     |   |                     |   |   |   |
> >  1)               |  tpm2_pcr_extend() {
> >  1)               |    tpm2_start_auth_session() {
> >  1) * 36976.99 us |      tpm_transmit_cmd();
> >  1) * 84746.51 us |      tpm_transmit_cmd();
> >  1) # 3195.083 us |      tpm_transmit_cmd();
> >  1) @ 126795.1 us |    }
> >  1)   2.254 us    |    tpm_buf_append_hmac_session();
> >  1)   3.546 us    |    tpm_buf_fill_hmac_session();
> >  1) * 24356.46 us |    tpm_transmit_cmd();
> >  1)   3.496 us    |    tpm_buf_check_hmac_response();
> >  1) @ 151171.0 us |  }
>
> Well, unfortunately, that tells us that it's the TPM itself that's
> taking the time processing the security overhead. The ordering of the
> commands in tpm2_start_auth_session() shows
>
> 37ms for context restore of null key
> 85ms for start session with encrypted salt
> 3ms to flush null key
> -----
> 125ms
>
> If we context save the session, we'd likely only bear a single 37ms
> cost to restore it (replacing the total 125ms). However, there's
> nothing we can do about the extend execution going from 6ms to 24ms, so
> I could halve your current boot time with security enabled (it's
> currently 149ms, it would go to 61ms, but it's still 10x slower than
> the unsecured extend at 6ms)
>
> James

I'll hold for better benchmarks.

BR, Jarkko