RE: [PATCH v3] x86/sgx: Fix RCU Tasks stalls in EPC sanitization loop
From: Miao, Jun
Date: Wed Jun 24 2026 - 04:57:51 EST
Hi Kai,
>(Reminder: you forgot the linux-sgx@xxxxxxxxxxxxxxx).
>
Ok, + CC linux-sgx in this reply.
>Could you move some context from your v1 and refine together with the above
>two paragraphs?
Okay, what about this commit description in v5?
Subject: [PATCH v5] x86/sgx: Fix RCU Tasks stall in EPC sanitization loop
During early boot, ksgxd (Intel Software Guard Extensions Kernel Thread)
iterates over all post-kexec dirty EPC pages in a tight loop calling
cond_resched() after each page. But, on isolated CPUs
(a common configuration in cloud VMs), cond_resched() never triggers a
real context switch because TIF_NEED_RESCHED is not set when no competing
runnable task exists on that CPU.
BPF LSM subsystem can invoke synchronize_rcu_tasks() at kernel boot time.
ksgxd() can never be rescheduled() when doing sanitizing all EPC pages.
As a result, a VM may take a long time to boot:
[ 134.806157] rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 130631 jiffies old.
[ 248.086158] INFO: task systemd:1 blocked for more than 122 seconds.
[ 248.086491] Not tainted 6.8.0-90-generic #91-Ubuntu
[ 248.086739] 'echo 0 > /proc/sys/kernel/hung_task_timeout_secs' disables this message.
[ 248.086993] task:systemd state:D stack:0 pid:1 tpid:1 ppid:0 flags:0x00000002
[ 248.087274] Call Trace:
...
[ 248.087939] schedule_timeout+0x157/0x170
[ 248.088120] wait_for_completion+0x88/0x150
[ 248.088304] __wait_rcu_gp+0x17e/0x190
[ 248.088481] synchronize_rcu_tasks_generic+0x64/0x60
...
[ 248.089047] synchronize_rcu_tasks+0x15/0x20
[ 248.089260] register_ftrace_direct+0x31f/0x350
...
[ 248.090339] bpf_trampoline_link_prog+0x33/0x60
[ 248.090518] bpf_tracing_prog_attach+0x3c5/0x5f0
...
After this patch test result:
Tests showed using cond_resched_tasks_rcu_qs() reduced the boot time from
~50s to ~10.7s (systemd-analyze: 724ms kernel + 1.575s initrd + 8.481s userspace = 10.782s)
[ kai: completely trim down/rewrite changelog ]
Reported-by: Challvy Tee <challvy.tee@xxxxxxxxx>
Link: https://github.com/systemd/systemd/issues/40423
Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections")
Tested-by: Challvy Tee <challvy.tee@xxxxxxxxx>
Suggested-by: Kai Huang <kai.huang@xxxxxxxxx>
Co-developed-by: Fan Du <fan.du@xxxxxxxxx>
Signed-off-by: Fan Du <fan.du@xxxxxxxxx>
Signed-off-by: Jun Miao <jun.miao@xxxxxxxxx>
---
Warm regards
Jun Miao