Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule,round 2

From: Michael Wang
Date: Tue May 21 2013 - 03:59:19 EST


On 05/21/2013 03:21 PM, Borislav Petkov wrote:
> On Tue, May 21, 2013 at 10:20:51AM +0800, Michael Wang wrote:
>> This is not enough to prove that policy->cpus is wrong, the cpu could
>> be online when get from policy->cpus, but offline when checked here,
>> since hotplug is able to happen during the period.
>
> Strictly speaking you're correct but I don't do any hotplug besides the
> one-time thing which is part of halting the box.

Well, they share the same cpu_down() I suppose...

>
>> I don't get it...
>>
>> get_online_cpus() is just stop hotplug happen after it was invoked, so
>> unless policy->cpus is really wrong, otherwise all the cpu it masked
>> won't go offline any more.
>
> Yes, that's my impression too - at the point we do gov_queue_work,
> policy->cpus already contains offline cpus.
>
>> This protect nothing...before we go here, the cpu could already
>> offline, nothing changed...
>
> Yes, but I don't want to schedule work on an offlined cpu and that is
> ensured here.

IMHO, the problem seems mostly like the wrong usage of policy->cpus,
it's providing the right info, but just at that time, we don't need
worry about work on offlined cpu if we don't allow cpu disappear.

Your approach could be good respect to performance, but if we could
prove that policy->cpus is correct firstly, than we could fix the
problem without any concern, don't we?

>
>> If you really want to confirm the policy->cpus was wrong, the way
>> should be apply the fix I suggested, than check online in here.
>
> Sure, feel free to get a box, enable NO_HZ_FULL and do all the
> experimentations you desire. I surely cannot be the only one who
> triggers this.

I'm fine if the problem get solved, that means your box doesn't show
WARN any more :)

Regards,
Michael Wang

>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/