Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem

From: Heiner Kallweit
Date: Sat Aug 18 2018 - 18:34:29 EST


On 18.08.2018 13:26, Thomas Gleixner wrote:
> On Thu, 16 Aug 2018, Heiner Kallweit wrote:
>
>> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
>> I think it's related to mentioned commit (didn't bisect it yet).
>> See log from suspending.
>>
>> I have no reason to think the fix is wrong, it may just have revealed
>> another issue which existed before and was hidden by the bug.
>
> Looks so. That seems to be related to CPU offlining. No idea yet...
>
I checked a little further and at the time the warning is printed the
cpu is still online but not active any longer.
I can avoid the warning with the following change, but as a
disclaimer: I have no clue of this subsystem and don't know what
I'm doing ..

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5b33e2f5c..19a030e40 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -862,13 +862,13 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
{
/*
- * If this CPU is offline and it is the one which updates
+ * If this CPU is inactive and it is the one which updates
* jiffies, then give up the assignment and let it be taken by
* the CPU which runs the tick timer next. If we don't drop
* this here the jiffies might be stale and do_timer() never
* invoked.
*/
- if (unlikely(!cpu_online(cpu))) {
+ if (unlikely(!cpu_active(cpu))) {
if (cpu == tick_do_timer_cpu)
tick_do_timer_cpu = TICK_DO_TIMER_NONE;
/*
--



>> Rgds, Heiner
>>
>> [ 75.073353] random: crng init done
>> [ 75.073402] random: 7 urandom warning(s) missed due to ratelimiting
>> [ 78.619564] PM: suspend entry (deep)
>> [ 78.619675] PM: Syncing filesystems ... done.
>> [ 78.653684] Freezing user space processes ... (elapsed 0.002 seconds) done.
>> [ 78.656094] OOM killer disabled.
>> [ 78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [ 78.658177] Suspending console(s) (use no_console_suspend to debug)
>> [ 78.663066] nuvoton-cir 00:07: disabled
>> [ 78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> [ 78.672210] sd 0:0:0:0: [sda] Stopping disk
>> [ 78.786651] ACPI: Preparing to enter system sleep state S3
>> [ 78.789613] PM: Saving platform NVS memory
>> [ 78.789759] Disabling non-boot CPUs ...
>> [ 78.805154] NOHZ: local_softirq_pending 202
>> [ 78.805182] NOHZ: local_softirq_pending 202
>> [ 78.807102] smpboot: CPU 1 is now offline
>>
>