Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default

Next message: Marc Kleine-Budde: "Re: [PATCH 3/3] can: rcar_canfd: Handle Bus-Off recovery interrupt"
Previous message: Aditya Rajan: "[PATCH] rust: dma: return EOVERFLOW instead of ENOMEM on size overflow"
In reply to: Salvatore Dipietro: "[PATCH 1/1] sched: Restore PREEMPT_NONE as default"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Peter Zijlstra

Date: Fri Apr 03 2026 - 17:32:26 EST

On Fri, Apr 03, 2026 at 07:19:36PM +0000, Salvatore Dipietro wrote:
> We are reporting a throughput and latency regression on PostgreSQL
> pgbench (simple-update) on arm64 caused by commit 7dadeaa6e851
> ("sched: Further restrict the preemption modes") introduced in
> v7.0-rc1.
>
> The regression manifests as a 0.51x throughput drop on a pgbench
> simple-update workload with 1024 clients on a 96-vCPU
> (AWS EC2 m8g.24xlarge) Graviton4 arm64 system. Perf profiling
> shows 55% of CPU time is consumed spinning in PostgreSQL's
> userspace spinlock (s_lock()) under PREEMPT_LAZY:
>
> |- 56.03% - StartReadBuffer
> |- 55.93% - GetVictimBuffer
> |- 55.93% - StrategyGetBuffer
> |- 55.60% - s_lock <<<< 55% of time
> | |- 0.39% - el0t_64_irq
> | |- 0.10% - perform_spin_delay
> |- 0.08% - LockBufHdr
> |- 0.07% - hash_search_with_hash_value
> |- 0.40% - WaitReadBuffers

The fix here is to make PostgreSQL make use of rseq slice extension:

https://lkml.kernel.org/r/20251215155615.870031952@xxxxxxxxxxxxx

That should limit the exposure to lock holder preemption (unless
PostgreSQL is doing seriously egregious things).