Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED

From: Linus Torvalds
Date: Sat Sep 09 2023 - 01:32:20 EST


On Fri, 8 Sept 2023 at 15:50, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > which actually makes me worry about the nested irq case, because this
> > would *not* be ok:
> >
> > allow_resched();
> > -> irq happens
> > -> *nested* irq happens
> > <- nested irq return (and preemption)
> >
> > ie the allow_resched() needs to still honor the irq count, and a
> > nested irq return obviously must not cause any preemption.
>
> I think we killed nested interrupts a fair number of years ago, but I'll
> recheck -- but not today, sleep is imminent.

I don't think it has to be an interrupt. I think the TIF_ALLOW_RESCHED
thing needs to look out for any nested exception (ie only ever trigger
if it's returning to the kernel "task" stack).

Because I could easily see us wanting to do "I'm going a big user
copy, it should do TIF_ALLOW_RESCHED, and I don't have preemption on",
and then instead of that first "irq happens", you have "page fault
happens" instead.

And inside that page fault handling you may well have critical
sections (like a spinlock) that is fine - but the fact that the
"process context" had TIF_ALLOW_RESCHED most certainly does *not* mean
that the page fault handler can reschedule.

Maybe it already does. As mentioned, I lost sight of the patch series,
even though I saw it originally (and liked it - only realizing on your
complaint that it migth be more dangerous than I thought).

Basically, the "allow resched" should be a marker for a single context
level only. Kind of like a register state bit that gets saved on the
exception stack. Not a "anything happening within this process is now
preemptible".

I'm hoping Ankur will just pipe in and say "of course I already
implemented it that way, see XYZ".

Linus