[RFC 0/2] efi/memattr: Fix memory corruption and warning issues
From: Usama Arif
Date: Wed Jan 08 2025 - 17:00:30 EST
Since the patch with the warning in [1] was merged, a very significant
number of kexec boots are producing the warning in our (Meta) fleet.
I believe there are 2 problems, the warning itself might not be
triggered on the right condition, and memory attributes
table is getting corrupted.
An example of the warning when its not triggered correctly and
is fixed by patch 1:
efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 2, desc_size == 48, num_entries == 48)
An example of the warning when memory attributes table
is getting corrupted and might possibly be fixed by patch 2:
efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 1, desc_size == 2072184435, num_entries == 3248688968)
Its clear that the desc size and num_entries are wrong.
The logic behind patch 1 is explained in its commit message.
The memory corruption is looking very similar to the problem that was
fixed by 77d48d39e99170 ("efistub/tpm: Use ACPI reclaim memory for
event log to avoid corruption"), but this time with memattr table,
where it might not be preserved during kexec. I have
not been able to reproduce this in the test machine I have
over the past couple of days (hence marked as RFC) , but its
happening often in our prod.
When this area is not reserved, it comes up as usable in
/sys/firmware/memmap. This means that kexec, which uses that memmap
to find usable memory regions, can select the region where
efi_mem_attr_table is and overwrite it and relocate_kernel.
Having a fix in firmware can be difficult to get through.
The next ideal place would be in libstub. However, it looks like
InstallMemoryAttributesTable [2] is not available as a boot service
call option [3], [4], I tried to use install_configuration_table as
a substitute, but its not valid and corrupts the MemoryAttributesTable.
The prints I got from the below code in coverletter were:
EFI stub: ERROR: KKK tbl 5f19e018 tbl_>version=1, tbl->num_entries 48 tbl->desc_size 48
EFI stub: ERROR: KKK2 tbl 67184018 tbl_>version=2048, tbl->num_entries 0 tbl->desc_size 0
which shows the table got corrupted. This can bee seen in the kernel boot as well after
(with the version showing up as 2048).
As a last option for a fix, the patch marks that region as reserved in
e820_table_firmware if it is currently E820_TYPE_RAM so that kexec doesn't
use it for kernel segments.
[1] https://lore.kernel.org/all/20241031175822.2952471-2-ardb+git@xxxxxxxxxx/
[2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Core/Dxe/Misc/MemoryAttributesTable.c#L100
[3] https://github.com/tianocore/edk2/blob/42a141800c0c26a09d2344e84a89ce4097a263ae/MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c#L41
[4] https://elixir.bootlin.com/linux/v6.12.6/source/drivers/firmware/efi/libstub/efistub.h#L327
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index d33ccbc4a2c6..a1a956f2d963 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -1143,6 +1143,7 @@ efi_enable_reset_attack_mitigation(void) { }
#endif
void efi_retrieve_eventlog(void);
+void efi_mem_attr_init(void);
struct screen_info *alloc_screen_info(void);
struct screen_info *__alloc_screen_info(void);
diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index 4f1fa302234d..c5b60aea342e 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -128,3 +128,35 @@ void efi_free(unsigned long size, unsigned long addr)
nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
efi_bs_call(free_pages, addr, nr_pages);
}
+
+void efi_mem_attr_init(void)
+{
+ efi_guid_t linux_mem_attr_guid = EFI_MEMORY_ATTRIBUTES_TABLE_GUID;
+ efi_memory_attributes_table_t *tbl = NULL;
+ efi_status_t status;
+ unsigned long size;
+
+ tbl = get_efi_config_table(linux_mem_attr_guid);
+ efi_err("KKK tbl %lx tbl_>version=%d, tbl->num_entries %d tbl->desc_size %d\n", tbl, tbl->version, tbl->num_entries, tbl->desc_size);
+
+ size = tbl->num_entries * tbl->desc_size;
+ status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY,
+ sizeof(*tbl) + size, (void **)&tbl);
+
+ if (status != EFI_SUCCESS) {
+ efi_err("Unable to allocate memory for event log\n");
+ return;
+ }
+
+ status = efi_bs_call(install_configuration_table,
+ &linux_mem_attr_guid, tbl);
+
+ if (status != EFI_SUCCESS)
+ efi_err("Unable to install configuration table to update memory type\n");
+ efi_bs_call(free_pool, tbl);
+
+ /* verify if its the same table */
+ tbl = get_efi_config_table(linux_mem_attr_guid);
+ efi_err("KKK2 tbl %lx tbl_>version=%d, tbl->num_entries %d tbl->desc_size %d\n", tbl, tbl->version, tbl->num_entries, tbl->desc_size);
+
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index f8e465da344d..c0c3d278451d 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -1036,6 +1036,8 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_retrieve_eventlog();
+ efi_mem_attr_init();
+
setup_graphics(boot_params);
setup_efi_pci(boot_params);
Usama Arif (2):
efi/memattr: Use desc_size instead of total size to check for
corruption
efi/memattr: add efi_mem_attr_table as a reserved region in
820_table_firmware
arch/x86/include/asm/e820/api.h | 2 ++
arch/x86/kernel/e820.c | 6 ++++++
arch/x86/platform/efi/efi.c | 9 +++++++++
drivers/firmware/efi/memattr.c | 17 +++++++----------
include/linux/efi.h | 7 +++++++
5 files changed, 31 insertions(+), 10 deletions(-)
--
2.43.5