Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem

From: Frederic Weisbecker
Date: Thu Dec 27 2018 - 20:31:43 EST


On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>
> OK, did as you advised and here comes the trace. That's the related dmesg part:
>
> [ 1479.025092] x86: Booting SMP configuration:
> [ 1479.025129] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 1479.094715] NOHZ: local_softirq_pending 202
> [ 1479.096557] smpboot: CPU 1 is now offline
>
> Hope it helps.
> Heiner
>
>
> # tracer: nop
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
[...]
> <idle>-0 [001] d.h2 1479.111017: softirq_raise: vec=9 [action=RCU]
> <idle>-0 [001] d.h2 1479.111026: softirq_raise: vec=7 [action=SCHED]
> <idle>-0 [001] ..s2 1479.111035: softirq_entry: vec=1 [action=TIMER]
> <idle>-0 [001] ..s2 1479.111040: softirq_exit: vec=1 [action=TIMER]
> <idle>-0 [001] ..s2 1479.111040: softirq_entry: vec=7 [action=SCHED]
> <idle>-0 [001] ..s2 1479.111052: softirq_exit: vec=7 [action=SCHED]
> <idle>-0 [001] ..s2 1479.111052: softirq_entry: vec=9 [action=RCU]
> <idle>-0 [001] .Ns2 1479.111079: softirq_exit: vec=9 [action=RCU]
> cpuhp/1-13 [001] dNh2 1479.112930: softirq_raise: vec=1 [action=TIMER]
> cpuhp/1-13 [001] dNh2 1479.112935: softirq_raise: vec=9 [action=RCU]

Interesting, the softirq is raised from hardirq but it's not handled in the end of
the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
to handle the pending work before we go idle. However I can imagine a small window
where such an expectation may not be met: if the softirq is raised after the ksoftirqd
thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
(CPUHP_TEARDOWN_CPU).

I don't know if we can afford to ignore a softirq even at this late stage. We should
probably avoid leaking any. So here is a possible fix, if you don't mind trying:

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d288133..716096b 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -56,6 +56,7 @@ EXPORT_PER_CPU_SYMBOL(irq_stat);
static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;

DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(int, ksoftirqd_parked);

const char * const softirq_to_name[NR_SOFTIRQS] = {
"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
@@ -363,7 +364,7 @@ static inline void invoke_softirq(void)
if (ksoftirqd_running(local_softirq_pending()))
return;

- if (!force_irqthreads) {
+ if (!force_irqthreads || __this_cpu_read(ksoftirqd_parked)) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
/*
* We can safely execute softirq on the current stack if
@@ -659,6 +660,22 @@ static void run_ksoftirqd(unsigned int cpu)
local_irq_enable();
}

+static void ksoftirqd_park(unsigned int cpu)
+{
+ local_irq_disable();
+ __this_cpu_write(ksoftirqd_parked, 1);
+
+ if (local_softirq_pending())
+ run_ksoftirqd(cpu);
+
+ local_irq_enable();
+}
+
+static void ksoftirqd_unpark(unsigned int cpu)
+{
+ __this_cpu_write(ksoftirqd_parked, 0);
+}
+
#ifdef CONFIG_HOTPLUG_CPU
/*
* tasklet_kill_immediate is called to remove a tasklet which can already be
@@ -724,6 +741,8 @@ static int takeover_tasklets(unsigned int cpu)
static struct smp_hotplug_thread softirq_threads = {
.store = &ksoftirqd,
.thread_should_run = ksoftirqd_should_run,
+ .park = ksoftirqd_park,
+ .unpark = ksoftirqd_unpark,
.thread_fn = run_ksoftirqd,
.thread_comm = "ksoftirqd/%u",
};