[PATCH v2 0/6] efi/runtime-wrappers: bound the wait for EFI runtime service calls

From: Breno Leitao

Date: Fri Jun 12 2026 - 07:02:32 EST


When an EFI runtime service call hangs in firmware, the kworker on
efi_rts_wq is stuck inside the firmware call and cannot be cancelled.
The kernel currently waits indefinitely on the completion, and the
caller holds efi_runtime_lock for the duration, so every subsequent
EFI runtime caller (efivarfs, NVRAM writes, set_wakeup_time, ACPI PRM
handlers, ...) is wedged until reboot. The only externally visible
symptom is a "workqueue lockup" message and userspace processes
piling up uninterruptibly on the semaphore.

A real example from one of our NVIDIA Grace hosts:

BUG: workqueue lockup - pool cpus=28 node=0 flags=0x0 nice=0 stuck for 127s!
...
CPU: 28 PID: 590 Comm: kworker/u288:6
Workqueue: efi_rts_wq efi_call_rts
Call trace:
0x4052f11ecc (P)
0x4052f10ed4
...
__efi_rt_asm_wrapper+0x50/0x78
efi_call_rts+0x178/0x240
process_scheduled_works+0x17c/0x420
worker_thread+0x184/0x4d8
kthread+0xcc/0x1f8
ret_from_fork+0x10/0x20

PC and LR are inside EFI runtime services firmware memory; firmware
never returned; the worker stayed stuck across the 127s / 157s / 188s
"workqueue lockup" reports until external monitoring eventually rebooted
the host.

This series doesn't fix the firmware bug - that's vendor territory -
but it stops one stuck EFI call from taking the rest of userspace
down with it, and turns a generic stalled-task mystery into an
unambiguous "EFI firmware is at fault" signal in dmesg, which is
especially valuable at fleet scale where the same symptom could
otherwise be attributed to dozens of unrelated stalls.

Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
---
Changes in v2:
- Drop v1's efi_rts_dead flag; reuse the existing EFI_RUNTIME_SERVICES bit
(cleared on timeout) and return EFI_ABORTED instead of EFI_TIMEOUT (per Ard).
- Also guard the non-blocking paths (set_variable/query_variable_info/reset_system)
and park the leaked worker via a shared efi_rts_park_worker() reused by x86's
page-fault handler;
- Split into smaller prep patches.
- Link to v1: https://lore.kernel.org/r/20260609-efi_timeout-v1-0-69a896faa805@xxxxxxxxxx

---
Breno Leitao (6):
efi: fix stale reference to efi_recover_from_page_fault()
efi/runtime-wrappers: handle queue_work() failure with goto exit
efi/runtime-wrappers: check EFI_RUNTIME_SERVICES before using efi_rts_work
efi/runtime-wrappers: bound the wait for EFI runtime service calls
efi/runtime-wrappers: honour EFI_RUNTIME_SERVICES in the non-blocking paths
efi/runtime-wrappers: retire the worker if a wedged call ever returns

arch/x86/platform/efi/quirks.c | 9 +----
drivers/firmware/efi/runtime-wrappers.c | 65 ++++++++++++++++++++++++++++-----
include/linux/efi.h | 6 ++-
3 files changed, 61 insertions(+), 19 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260609-efi_timeout-6f51d5bbcfb7

Best regards,
--
Breno Leitao <leitao@xxxxxxxxxx>