Re: [regression] significant delays when secureboot is enabled since 6.10
From: James Bottomley
Date: Wed Sep 11 2024 - 08:22:07 EST
On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote:
> On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote:
> > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote:
> > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote:
> > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote:
> > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote:
> > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression
> > > > > > tracking
> > > > > > (Thorsten
> > > > > > Leemhuis) wrote:
> > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker.
> > > > > > >
> > > > > > > James, Jarkoo, I noticed a report about a regression in
> > > > > > > bugzilla.kernel.org that appears to be caused by this
> > > > > > > change of
> > > > > > > yours:
> > > > > > >
> > > > > > > 6519fea6fd372b ("tpm: add hmac checks to
> > > > > > > tpm2_pcr_extend()")
> > > > > > > [v6.10-rc1]
> > > > > > >
> > > > > > > As many (most?) kernel developers don't keep an eye on
> > > > > > > the bug
> > > > > > > tracker,
> > > > > > > I decided to forward it by mail. To quote from
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 :
> > > > > > >
> > > > > > > > When secureboot is enabled,
> > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel.
> > > > > > > > it's ~7 seconds on 6.8 kernel version.
> > > > > > > >
> > > > > > > > When secureboot is disabled,
> > > > > > > > the boot time is ~7 seconds too.
> > > > > > > >
> > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad
> > > > > > > > X1 and
> > > > > > > > T14.
> > > > > > > >
> > > > > > > > It probably caused autologin failure and micmute led
> > > > > > > > not
> > > > > > > > loaded on AMD platform.
> > > > > > >
> > > > > > > It was later bisected to the change mentioned above. See
> > > > > > > the
> > > > > > > ticket for
> > > > > > > more details.
> > > > > >
> > > > > > Hi
> > > > > >
> > > > > > I suspect I encountered the same problem:
> > > > > >
> > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@xxxxxxxxxxxxxxx/
> > > > > >
> > > > > > Going to provide more info there.
> > > > >
> > > > > I suppose you are going try to acquire the tracing data I
> > > > > asked?
> > > > > That would be awesome, thanks for taking the troube. Let's
> > > > > look
> > > > > at the data and draw conclusions based on that.
> > > > >
> > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the
> > > > > kernel
> > > > > configuration disables the feature.
> > > > >
> > > > > For making decisions what to do with the we are talking
> > > > > about ~2
> > > > > week window estimated, given the Vienna conference slows
> > > > > things
> > > > > down, so I hope my workaround is good enough before that.
> > > >
> > > > I can enumerate three most likely ways to address the issue:
> > > >
> > > > 1. Strongest: drop from defconfig.
> > > > 2. Medium: leave to defconfig but add an opt-in kernel command-
> > > > line
> > > > parameter.
> > > > 3. Lightest: if we can based on tracing data nail the
> > > > regression in
> > > > sustainable schedule, fix it.
> > >
> > > Actually, there's a fourth: not use sessions for the PCR extend
> > > (if
> > > we'd got the timings when I asked, this was going to be my
> > > suggestion
> > > if they came back problematic). This seems only to be a problem
> > > for
> > > IMA measured boot (because it does lots of extends). If
> > > necessary this
> > > could even be wrapped in a separate config or boot option that
> > > only
> > > disables HMAC on extend if IMA (so we still get security for
> > > things
> > > like sd-boot)
> >
> > I can buy that but with a twist that make it an opt-in kernel
> > command
> > line option. We don't want to take already existing functionality
> > away
> > from those who might want to use it (given e.g. hardening
> > requirements),
> > and with that basis opt-in (by default disabled) would be more
> > balanced
> > way to address the issue.
> >
> > Please do a send a patch!
>
> I made few measurements. I have a Fedora 38 VM with TPM passthrough.
>
> Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host)
>
> QEMU:
>
> rc qemu-kvm 1:4.2-
> 3ubuntu6.27
> ii qemu-system-x86 1:6.2+dfsg-
> 2ubuntu6.22
>
>
> TPM2_PT_MANUFACTURER:
> raw: 0x49465800
> value: "IFX"
> TPM2_PT_VENDOR_STRING_1:
> raw: 0x534C4239
> value: "SLB9"
> TPM2_PT_VENDOR_STRING_2:
> raw: 0x36373000
> value: "670"
>
>
> No HMAC:
>
> # tracer: function_graph
> #
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 0) | tpm2_pcr_extend() {
> 0) 1.112 us | tpm_buf_append_hmac_session();
> 0) # 6360.029 us | tpm_transmit_cmd();
> 0) # 6415.012 us | }
>
>
> HMAC:
>
> # tracer: function_graph
> #
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 1) | tpm2_pcr_extend() {
> 1) | tpm2_start_auth_session() {
> 1) * 36976.99 us | tpm_transmit_cmd();
> 1) * 84746.51 us | tpm_transmit_cmd();
> 1) # 3195.083 us | tpm_transmit_cmd();
> 1) @ 126795.1 us | }
> 1) 2.254 us | tpm_buf_append_hmac_session();
> 1) 3.546 us | tpm_buf_fill_hmac_session();
> 1) * 24356.46 us | tpm_transmit_cmd();
> 1) 3.496 us | tpm_buf_check_hmac_response();
> 1) @ 151171.0 us | }
Well, unfortunately, that tells us that it's the TPM itself that's
taking the time processing the security overhead. The ordering of the
commands in tpm2_start_auth_session() shows
37ms for context restore of null key
85ms for start session with encrypted salt
3ms to flush null key
-----
125ms
If we context save the session, we'd likely only bear a single 37ms
cost to restore it (replacing the total 125ms). However, there's
nothing we can do about the extend execution going from 6ms to 24ms, so
I could halve your current boot time with security enabled (it's
currently 149ms, it would go to 61ms, but it's still 10x slower than
the unsecured extend at 6ms)
James