Re: 5.3 boot regression caused by 5.3 TPM changes

From: Hans de Goede
Date: Wed Aug 07 2019 - 16:40:07 EST


Hi,

On 07-08-19 22:13, Hans de Goede wrote:
Hi,

On 07-08-19 21:58, Hans de Goede wrote:
Hi,

On 05-08-19 18:01, Ard Biesheuvel wrote:
On Sun, 4 Aug 2019 at 19:12, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:

Hi,

On 04-08-19 17:33, Ard Biesheuvel wrote:
Hi Hans,

On Sun, 4 Aug 2019 at 13:00, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:

Hi All,

While testing 5.3-rc2 on an Irbis TW90 Intel Cherry Trail based
tablet I noticed that it does not boot on this device.

A git bisect points to commit 166a2809d65b ("tpm: Don't duplicate
events from the final event log in the TCG2 log")

And I can confirm that reverting just that single commit makes
the TW90 boot again.

This machine uses AptIO firmware with base component versions
of: UEFI 2.4 PI 1.3. I've tried to reproduce the problem on
a Teclast X80 Pro which is also CHT based and also uses AptIO
firmware with the same base components. But it does not reproduce
there. Neither does the problem reproduce on a CHT tablet using
InsideH20 based firmware.

Note that these devices have a software/firmware TPM-2.0
implementation, they do not have an actual TPM chip.

Comparing TPM firmware setting between the 2 AptIO based
tablets the settings are identical, but the troublesome
TW90 does have some more setting then the X80, it has
the following settings which are not shown on the X80:

Active PCR banks:ÂÂÂÂÂÂÂÂÂÂ SHA-1ÂÂÂÂÂÂÂÂ (read only)
Available PCR banks:ÂÂÂÂÂÂÂ SHA-1,SHA256Â (read only)
TPM2.0 UEFI SPEC version:ÂÂ TCG_2ÂÂÂÂÂÂÂÂ (other possible setting: TCG_1_2
Physical Presence SPEC ver: 1.2ÂÂÂÂÂÂÂÂÂÂ (other possible setting: 1.3)

I have the feeling that at least the first 2 indicate that
the previous win10 installation has actually used the
TPM, where as on the X80 the TPM is uninitialized.
Note this is just a hunch I could be completely wrong.

I would be happy to run any commands to try and debug this
or to build a kernel with some patches to gather more info.

Note any kernel patches to printk some debug stuff need
to be based on 5.3 with 166a2809d65b reverted, without that
reverted the device will not boot, and thus I cannot collect
logs without it reverted.


Are you booting a 64-bit kernel on 32-bit firmware?

Yes you are right, I must say that this is somewhat surprising
most Cherry Trail devices do use 64 bit firmware (where as Bay Trail
typically uses 32 bit). But I just checked efibootmgr output and it
says it is booting: \EFI\FEDORA\SHIMIA32.EFI so yeah 32 bit firmware.

Recent Fedora releases take care of this so seamlessly I did not
even realize...


OK, so we'll have to find out how this patch affects 64-bit code
running on 32-bit firmware.

I was not sure this really is a 32 bit firmware issue, as I believed
I saw 5.3 running fine on other 32 bit firmware devices, so I tried
this on another device with 32 bit UEFI and you're right this is a
32 bit issue.

The only EFI call in that patch is
get_config_table(), which is not actually a EFI boot service call but
a EFI stub helper that parses the config table array in the EFI system
table.

Well get_efi_config_table() is a new function in 5.3, introduced
specifically for the 166a2809d65b ("tpm: Don't duplicate events from the
final event log in the TCG2 log") commit.

It was introduced in commit 82d736ac56d7 ("Abstract out support for
locating an EFI config table") and after taking a good look at this
commit I'm pretty confident to say that the new get_efi_config_table()
function is the problem, as it is broken in multiple ways.

In itself the new get_efi_config_table() is just factoring out some
code used in drivers/firmware/efi/libstub/fdt.c into a new helper
for reuse and not making any functional changes to the factored out
code.

The problem is that the old code which it factors out contains a number
of assumptions which are true in the get_fdt() context from which it
was called but are not true when used in more generic code as is done
from the 166a2809d65b ("tpm: Don't duplicate events from the
final event log in the TCG2 log") commit.

There are 2 problems with the new get_efi_config_table() function:

1) sys_table->tables contains a physical address, we cannot just
cast that to a pointer and deref it, it needs to be early_memremap-ed
and then we deref the mapping. I'm somewhat amazed that this works
at all for the 64 bit case, but apparently it does.

2) sys_table->tables points to either an array of either
efi_config_table_64_t structd or an array of efi_config_table_32_t
structs. efi_config_table_t is a generic type for storing information
when parsing it should NOT be used for reading the actual tables
as they come from the firmware when parsing! Now efi_config_table_t
happens to be an exact match for efi_config_table_64_t when building
an x86_64 kernel, so this happens to work for the 64 bit firmware case.

The properway to deal with this would be to use the existing
efi_config_parse_tables() functionality from drivers/firmware/efi/efi.c
by adding entry for the LINUX_EFI_TPM_FINAL_LOG_GUID to the
common_tables[] array in drivers/firmware/efi/efi.c and make that
entry store the table address (if found) in a new efi.final_log
member.

There actually already is a efi.tpm_final_log member where the
table's physical address is waiting for us all pre-parsed and ready
to use ...

Oops, I missed the code in question is running really early during
boot, so please forget most of my rambling above, as it is wrong
given how early during boot this is happening.

Regards,

Hans