Re: [PATCH] sched: Prevent raising SCHED_SOFTIRQ when CPU is !active
From: Thomas Gleixner
Date: Tue Dec 15 2020 - 12:54:10 EST
On Tue, Dec 15 2020 at 16:05, Peter Zijlstra wrote:
> On Tue, Dec 15, 2020 at 09:34:15AM -0500, Steven Rostedt wrote:
>> On Tue, 15 Dec 2020 15:23:39 +0100 (CET)
>> Anna-Maria Behnsen <anna-maria@xxxxxxxxxxxxx> wrote:
>>
>> > > > + /*
>> > > > + * Remove CPU from nohz.idle_cpus_mask to prevent participating in
>> > > > + * load balancing when not active
>> > > > + */
>> > > > + nohz_balance_exit_idle(rq);
>> > > > +
>> > > > set_cpu_active(cpu, false);
>> > > > /*
>> > > > * We've cleared cpu_active_mask, wait for all preempt-disabled and RCU
>> > >
>> > > OK, so we must clear the state before !active, because getting an
>> > > interrupt/softirq after would trigger the badness. And we're guaranteed
>> > > nothing blocks between them to re-set it.
>> >
>> > As far as I understood, it is not a problem whether the delete is before or
>> > after !active. When it is deleted after, the remote CPU will return in
>> > kick_ilb() because cpu is not idle, because it is running the hotplug
>> > thread.
>>
>> I was thinking that disabling it after may also cause some badness. Even if
>> it does not, I think there's no harm in clearing it just before setting cpu
>> active to false. And I find that the safer option.
>
> The paranoid in me wanted to write it like:
>
> preempt_disable();
> nohz_balance_exit_idle(rq);
> set_cpu_active(cpu, false);
> preempt_enable();
>
> (or possibly even local_irq_disable), to guarantee we don't hit idle
> between them (which could re-set the nohz idle state we just cleared).
>
> But then I gave up :-)
I might be missing something, but how is the CPU which runs the pinned
kernel thread, i.e. the hotplug thread, supposed to go idle between the
two calls?
Really the order is completely irrelevant.
Remote kick_ilb() checks nohz_mask _AND_ idle_cpu()
Local nohz_enter() checks cpu_active()
I still might be missing something magic though, mushrooms perhaps. :)
Thanks,
tglx