Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default
From: Peter Zijlstra
Date: Tue Apr 07 2026 - 05:08:01 EST
On Tue, Apr 07, 2026 at 10:20:18AM +0200, Peter Zijlstra wrote:
> On Sun, Apr 05, 2026 at 11:38:59AM +0530, Ritesh Harjani wrote:
>
> > However, for curiosity, I was hoping if someone more familiar with the
> > scheduler area can explain why PREEMPT_LAZY v/s PREEMPT_NONE, causes
> > performance regression w/o huge pages?
> >
> > Minor page fault handling has micro-secs latency, where as sched ticks
> > is in milli-secs. Besides, both preemption models should anyway
> > schedule() if TIF_NEED_RESCHED is set on return to userspace, right?
> >
> > So was curious to understand how is the preemption model causing
> > performance regression with no hugepages in this case?
>
> So yes, everything can schedule on return-to-user (very much including
> NONE). Which is why rseq slice ext is heavily recommended for anything
> attempting user space spinlocks.
>
> The thing where the other preemption modes differ is the scheduling
> while in kernel mode. So if the workload is spending significant time in
> the kernel, this could cause more scheduling.
>
> As you already mentioned, no huge pages, gives us more overhead on #PF
> (and TLB miss, but that's mostly hidden in access latency rather than
> immediate system time). This gives more system time, and more room to
> schedule.
>
> If we get preempted in the middle of a #PF, rather than finishing it,
> this increases the #PF completion time and if userspace is trying to
> access this page concurrently.... But we should see that in mmap_lock
> contention/idle time :/
Sorry, insufficient wake-up juice applied. Concurrent page-faults are
serialized on the page-table (spin) locks. Not mmap_lock.
So it would increase system time and give more rise to kernel
preemption.