Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default

Next message: Bera Y&#xFC;zl&#xFC;: "Re: [PATCH v2] staging: rtl8723bs: use standard error codes in xmit init path"
Previous message: Costa Shulyupin: "[PATCH v1] Documentation/rtla: Convert links to RST format"
In reply to: Mitsumasa KONDO: "Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default"
Next in thread: Mitsumasa KONDO: "Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Andres Freund

Date: Sun Apr 05 2026 - 12:43:25 EST

Hi,

On 2026-04-05 23:44:25 +0900, Mitsumasa KONDO wrote:
> I believe the root cause is the inadequacy of PostgreSQL's arm64
> spin_delay() implementation, which PREEMPT_LAZY merely exposed.
>
> PostgreSQL's SPIN_DELAY() uses dramatically different instructions
> per architecture (src/include/storage/s_lock.h):
>
> x86_64: rep; nop (PAUSE, ~140 cycles)
> arm64: isb (pipeline flush, ~10-20 cycles)
>
> Under PREEMPT_NONE, lock holders are rarely preempted, so spin
> duration is short and ISB's lightweight delay is sufficient.
>
> Under PREEMPT_LAZY, lock holder preemption becomes more frequent.
> When this occurs, waiters enter a sustained spin loop.

It's not sustained, the spinning just lasts a between 10 and 1000 iterations,
after that there's randomized exponential backoff using nanosleep.

Which actually will happen after a smaller number of cycles of with a shorter
SPIN_DELAY.

In the 4kB workload, nearly all backends are in the exponential backoff.

If I remove the rep nop on x86-64, the performance of the 4kB pages workload
is basically unaffected, even with PREEMPT_LAZY.

The spinning helps with workloads that are contended for very short amounts of
time. But that's not the case in this workload without huge pages, instead of
low 10s of cycles, we regularly spend a few orders of magnitude more cycles
holding the lock.

That's not to say the arm64 spin delay implementation is good. It just doesn't
seem like it affects this case much.

As hinted at by my neighboring email, I see some performance differences due
to PREEMPT_LAZY even when replacing the spinlock with a futex based lock, as
long as I use 4kB pages. Which seems like the expected thing?

Greetings,

Andres Freund