Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default
From: Peter Zijlstra
Date: Tue Apr 07 2026 - 04:49:29 EST
On Sat, Apr 04, 2026 at 01:42:22PM -0400, Andres Freund wrote:
> Hi,
>
> On 2026-04-03 23:32:07 +0200, Peter Zijlstra wrote:
> > On Fri, Apr 03, 2026 at 07:19:36PM +0000, Salvatore Dipietro wrote:
> > > We are reporting a throughput and latency regression on PostgreSQL
> > > pgbench (simple-update) on arm64 caused by commit 7dadeaa6e851
> > > ("sched: Further restrict the preemption modes") introduced in
> > > v7.0-rc1.
> > >
> > > The regression manifests as a 0.51x throughput drop on a pgbench
> > > simple-update workload with 1024 clients on a 96-vCPU
> > > (AWS EC2 m8g.24xlarge) Graviton4 arm64 system. Perf profiling
> > > shows 55% of CPU time is consumed spinning in PostgreSQL's
> > > userspace spinlock (s_lock()) under PREEMPT_LAZY:
> > >
> > > |- 56.03% - StartReadBuffer
> > > |- 55.93% - GetVictimBuffer
> > > |- 55.93% - StrategyGetBuffer
> > > |- 55.60% - s_lock <<<< 55% of time
> > > | |- 0.39% - el0t_64_irq
> > > | |- 0.10% - perform_spin_delay
> > > |- 0.08% - LockBufHdr
> > > |- 0.07% - hash_search_with_hash_value
> > > |- 0.40% - WaitReadBuffers
> >
> > The fix here is to make PostgreSQL make use of rseq slice extension:
> >
> > https://lkml.kernel.org/r/20251215155615.870031952@xxxxxxxxxxxxx
> >
> > That should limit the exposure to lock holder preemption (unless
> > PostgreSQL is doing seriously egregious things).
>
> Maybe we should, but requiring the use of a new low level facility that was
> introduced in the 7.0 kernel, to address a regression that exists only in
> 7.0+, seems not great.
>
> It's not like it's a completely trivial thing to add support for either, so I
> doubt it'll be the right thing to backpatch it into already released major
> versions of postgres.
Just to clarify my response: all I really saw was 'userspace spinlock'
and we just did the rseq slice ext stuff (with Oracle) for exactly this
type of thing. And even NONE is susceptible to scheduling the lock
holder.
It was also the last email I did on Good Friday and thinking hard really
wasn't high on the list of things :-)
Anyway, IF we revert -- and I think you've already made a fine case for
not doing that -- it will be a very temporary thing, NONE will go away.
As to kernel version thing; why should people upgrade to the very latest
kernel release and not also be expected to upgrade PostgreSQL to the
very latest?
If they want to use old PostgreSQL, they can use old kernel too, right?
Both have stable releases that should keep them afloat for a while.
Again, not saying we can't do better, but also sometimes you have to
break eggs to make cake :-)