Re: [regression] significant delays when secureboot is enabled since 6.10
From: Jarkko Sakkinen
Date: Sat Sep 14 2024 - 06:42:30 EST
On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote:
> On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote:
> > On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote:
> > > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote:
> > > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote:
> > > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote:
> > > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote:
> > > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression
> > > > > > > tracking
> > > > > > > (Thorsten
> > > > > > > Leemhuis) wrote:
> > > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker.
> > > > > > > >
> > > > > > > > James, Jarkoo, I noticed a report about a regression in
> > > > > > > > bugzilla.kernel.org that appears to be caused by this
> > > > > > > > change of
> > > > > > > > yours:
> > > > > > > >
> > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to
> > > > > > > > tpm2_pcr_extend()")
> > > > > > > > [v6.10-rc1]
> > > > > > > >
> > > > > > > > As many (most?) kernel developers don't keep an eye on
> > > > > > > > the bug
> > > > > > > > tracker,
> > > > > > > > I decided to forward it by mail. To quote from
> > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 :
> > > > > > > >
> > > > > > > > > When secureboot is enabled,
> > > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel.
> > > > > > > > > it's ~7 seconds on 6.8 kernel version.
> > > > > > > > >
> > > > > > > > > When secureboot is disabled,
> > > > > > > > > the boot time is ~7 seconds too.
> > > > > > > > >
> > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad
> > > > > > > > > X1 and
> > > > > > > > > T14.
> > > > > > > > >
> > > > > > > > > It probably caused autologin failure and micmute led
> > > > > > > > > not
> > > > > > > > > loaded on AMD platform.
> > > > > > > >
> > > > > > > > It was later bisected to the change mentioned above. See
> > > > > > > > the
> > > > > > > > ticket for
> > > > > > > > more details.
> > > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > I suspect I encountered the same problem:
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@xxxxxxxxxxxxxxx/
> > > > > > >
> > > > > > > Going to provide more info there.
> > > > > >
> > > > > > I suppose you are going try to acquire the tracing data I
> > > > > > asked?
> > > > > > That would be awesome, thanks for taking the troube. Let's
> > > > > > look
> > > > > > at the data and draw conclusions based on that.
> > > > > >
> > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the
> > > > > > kernel
> > > > > > configuration disables the feature.
> > > > > >
> > > > > > For making decisions what to do with the we are talking
> > > > > > about ~2
> > > > > > week window estimated, given the Vienna conference slows
> > > > > > things
> > > > > > down, so I hope my workaround is good enough before that.
> > > > >
> > > > > I can enumerate three most likely ways to address the issue:
> > > > >
> > > > > 1. Strongest: drop from defconfig.
> > > > > 2. Medium: leave to defconfig but add an opt-in kernel command-
> > > > > line
> > > > > parameter.
> > > > > 3. Lightest: if we can based on tracing data nail the
> > > > > regression in
> > > > > sustainable schedule, fix it.
> > > >
> > > > Actually, there's a fourth: not use sessions for the PCR extend
> > > > (if
> > > > we'd got the timings when I asked, this was going to be my
> > > > suggestion
> > > > if they came back problematic). This seems only to be a problem
> > > > for
> > > > IMA measured boot (because it does lots of extends). If
> > > > necessary this
> > > > could even be wrapped in a separate config or boot option that
> > > > only
> > > > disables HMAC on extend if IMA (so we still get security for
> > > > things
> > > > like sd-boot)
> > >
> > > I can buy that but with a twist that make it an opt-in kernel
> > > command
> > > line option. We don't want to take already existing functionality
> > > away
> > > from those who might want to use it (given e.g. hardening
> > > requirements),
> > > and with that basis opt-in (by default disabled) would be more
> > > balanced
> > > way to address the issue.
> > >
> > > Please do a send a patch!
> >
> > I made few measurements. I have a Fedora 38 VM with TPM passthrough.
> >
> > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host)
> >
> > QEMU:
> >
> > rc qemu-kvm 1:4.2-
> > 3ubuntu6.27
> > ii qemu-system-x86 1:6.2+dfsg-
> > 2ubuntu6.22
> >
> >
> > TPM2_PT_MANUFACTURER:
> > raw: 0x49465800
> > value: "IFX"
> > TPM2_PT_VENDOR_STRING_1:
> > raw: 0x534C4239
> > value: "SLB9"
> > TPM2_PT_VENDOR_STRING_2:
> > raw: 0x36373000
> > value: "670"
> >
> >
> > No HMAC:
> >
> > # tracer: function_graph
> > #
> > # CPU DURATION FUNCTION CALLS
> > # | | | | | | |
> > 0) | tpm2_pcr_extend() {
> > 0) 1.112 us | tpm_buf_append_hmac_session();
> > 0) # 6360.029 us | tpm_transmit_cmd();
> > 0) # 6415.012 us | }
> >
> >
> > HMAC:
> >
> > # tracer: function_graph
> > #
> > # CPU DURATION FUNCTION CALLS
> > # | | | | | | |
> > 1) | tpm2_pcr_extend() {
> > 1) | tpm2_start_auth_session() {
> > 1) * 36976.99 us | tpm_transmit_cmd();
> > 1) * 84746.51 us | tpm_transmit_cmd();
> > 1) # 3195.083 us | tpm_transmit_cmd();
> > 1) @ 126795.1 us | }
> > 1) 2.254 us | tpm_buf_append_hmac_session();
> > 1) 3.546 us | tpm_buf_fill_hmac_session();
> > 1) * 24356.46 us | tpm_transmit_cmd();
> > 1) 3.496 us | tpm_buf_check_hmac_response();
> > 1) @ 151171.0 us | }
>
> Well, unfortunately, that tells us that it's the TPM itself that's
> taking the time processing the security overhead. The ordering of the
> commands in tpm2_start_auth_session() shows
>
> 37ms for context restore of null key
> 85ms for start session with encrypted salt
> 3ms to flush null key
> -----
> 125ms
>
> If we context save the session, we'd likely only bear a single 37ms
> cost to restore it (replacing the total 125ms). However, there's
> nothing we can do about the extend execution going from 6ms to 24ms, so
> I could halve your current boot time with security enabled (it's
> currently 149ms, it would go to 61ms, but it's still 10x slower than
> the unsecured extend at 6ms)
Please address how this discussion is related to https://bugzilla.kernel.org/show_bug.cgi?id=219229
I just read the bug report nothing about IMA or PCR extend.
There's now tons of spam about performance issue in a patch set that is
not in the mainline and barely nothing about the original issue:
"
When secureboot is enabled,
the kernel boot time is ~20 seconds after 6.10 kernel.
it's ~7 seconds on 6.8 kernel version.
When secureboot is disabled,
the boot time is ~7 seconds too.
Reproduced on both AMD and Intel platform on ThinkPad X1 and T14.
It probably caused autologin failure and micmute led not loaded on AMD platform.
6.9 kernel version is not tested since not signed kernel found.
6.8, 6.10, 6.11 are tested, the first bad version is 6.10.
"
How is this going to help to fix this one?
I say this once and one: I zero care fixing code that is in the
mainline.
BR, Jarkko