Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default
From: Mitsumasa KONDO
Date: Sun Apr 05 2026 - 10:45:04 EST
I believe the root cause is the inadequacy of PostgreSQL's arm64
spin_delay() implementation, which PREEMPT_LAZY merely exposed.
PostgreSQL's SPIN_DELAY() uses dramatically different instructions
per architecture (src/include/storage/s_lock.h):
x86_64: rep; nop (PAUSE, ~140 cycles)
arm64: isb (pipeline flush, ~10-20 cycles)
Under PREEMPT_NONE, lock holders are rarely preempted, so spin
duration is short and ISB's lightweight delay is sufficient.
Under PREEMPT_LAZY, lock holder preemption becomes more frequent.
When this occurs, waiters enter a sustained spin loop. On arm64,
ISB provides negligible delay, so the loop runs at near-full speed,
hammering the lock cacheline via TAS_SPIN's *(lock) load on every
iteration. This generates massive cache coherency traffic that in
turn slows the lock holder's execution after rescheduling, creating
a feedback loop that escalates on high-core-count systems.
On x86_64, PAUSE throttles this loop sufficiently to prevent the
feedback loop, which explains why this is not reproducible there.
Patching PostgreSQL's arm64 spin_delay() to use WFE instead of ISB
should significantly reduce the regression without kernel changes.
That said, this change is likely to cause similar breakage in other
user-space applications beyond PostgreSQL that rely on lightweight
spin loops on arm64. So I agree that the patch to retain PREEMPT_NONE
is the right approach. At the same time, this is also something that
distributions can resolve by patching their default kernel configuration.
Regards,
--
Mitsumasa Kondo
NTT Software Innovation Center