Re: [patch 17/18] sched: Enable might_sleep() checks early

From: Thomas Gleixner
Date: Tue May 16 2017 - 03:34:01 EST


On Tue, 16 May 2017, Peter Zijlstra wrote:
> On Mon, May 15, 2017 at 09:12:03PM +0200, Thomas Gleixner wrote:
> > On Mon, 15 May 2017, Steven Rostedt wrote:
> >
> > > On Sun, 14 May 2017 20:27:33 +0200
> > > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > >
> > > > might_sleep() checks are enabled after the boot process is done. That hides
> > > > bugs in the smp bringup and driver initialization code.
> > > >
> > > > Enable it right when the scheduler starts working, i.e. when init task and
> > > > kthreadd have been created and right before the idle task enables
> > > > preemption.
> > >
> > > Looking at commit b433c3d4549ae749, it appears that on very slow
> > > machines, there is a possibility that the init task can start running.
> > > Should system_state be updated before that complete() is called?
> >
> > That commit is magic voodoo with exactly no effect at all.
> >
> > rest_init() is called with preemption disabled and nothing can schedule
> > there _before_ schedule_preempt_disabled().
> >
> > Both threads - init task and kthreadd - are only created and woken up. They
> > cannot get on the CPU simply because preemption is disabled. And this was
> > the case back then in 2.6.35 as well.
> >
> > It does not matter at all whether the machine is slow or not. That
> > completion is pointless.
> >
> > Peter, can you explain what the heck this patch is actually doing?
>
> Argh.. what a shit Changelog, who wrote that crap!?

Indeed.

> So the problem was with PREEMPT_VOLUNTARY (where, as you know,
> preempt_disable() has no meaning).
>
> Supposedly there's a might_sleep()/cond_resched() point somewhere around
> there (every alloc in the fork path for example), which will happily
> reschedule us.

Darn, forgot about PREEMPT_VOLUNTARY and that excellent changelog does not
mention it either.

> So if we schedule to the kernel_init() task before we set kthreadd_task
> we'll try and spawn kthreads and OOPS.

So back to Stevens question. No, we can't set the state earlier than right
before schedule() simply because with PREEMPT preemption _is_ actually
disabled and kernel_kthread() will trigger might_sleep() splats.

What a mess.

Thanks,

tglx