[PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH 7/7]printk: Wake up klogd using irq_work)

From: Steven Rostedt
Date: Thu Nov 15 2012 - 11:34:12 EST


On Thu, 2012-11-15 at 16:25 +0100, Frederic Weisbecker wrote:
> 2012/11/15 Steven Rostedt <rostedt@xxxxxxxxxxx>:
> > On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
> >> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> >> index f249e8c..822d757 100644
> >> --- a/kernel/time/tick-sched.c
> >> +++ b/kernel/time/tick-sched.c
> >> @@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >> time_delta = timekeeping_max_deferment();
> >> } while (read_seqretry(&xtime_lock, seq));
> >>
> >> - if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
> >> + if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
> >
> > If the CPU is going offline, the printk_tick() would be executed here.
> > But now that printk_tick() is done with the irq_work code, it wont be
> > executed till the next tick. Could this cause a missed printk because
> > of this, if the cpu is going offline?
> >
> > Actually, how does irq_work in general handle cpu offline work?
>
> Good point, and that's not trivial to solve.
>
> The hotplug down sequence does:
>
> ----->
> CPU that offilines CPU offlining
> -----------------
> ---------------------
> cpu_down() {
> __stop_machine(take_cpu_down)
>
> take_cpu_down() {
>
> __cpu_disable() {
>
> * disable irqs in hw
>
> * clear from online mask
> }
>
> move all tasks somewhere
> }
> while (!idle_cpu(offlining))
> cpu_relax()
>
> cpu_die();
> <---------
>
> So the offlining CPU goes to idle in the end once irqs are disabled in
> the apic level. Does that include the timer tick? If so then the last
> resort to offline without irq works in the queue is to make
> take_cpu_down() ask for a retry if there are pending irq works during
> its execution.
>
> Now if we have printk() calls between __cpu_disable() and the idle
> loop, they will be lost until the next onlining. Unless we do an
> explicit call to printk_tick() from the idle loop if the CPU is
> offline.
>
> Note that !CONFIG_NO_HZ doesn't seem to handle that. Which makes me
> wonder if the tick is really part of the whole IRQ disablement done in
> __cpu_disable().


How about flushing all irq_work from CPU_DYING. The notifier is called
by stop_machine on the CPU that is going down. Grant you, the code will
not be called from irq context (so things like get_irq_regs() wont work)
but I'm not sure what the requirements are for irq_work in that regard
(Peter?). But irqs are disabled and the CPU is about to go offline.
Might as well flush the work.

I ran this against my stress_cpu_hotplug script (attached) and it seemed
to work fine. I even did a:

perf record ./stress-cpu-hotplug

Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>

Index: linux-rt.git/kernel/irq_work.c
===================================================================
--- linux-rt.git.orig/kernel/irq_work.c
+++ linux-rt.git/kernel/irq_work.c
@@ -14,6 +14,7 @@
#include <linux/irqflags.h>
#include <linux/sched.h>
#include <linux/tick.h>
+#include <linux/cpu.h>
#include <asm/processor.h>


@@ -105,11 +106,7 @@ bool irq_work_needs_cpu(void)
return true;
}

-/*
- * Run the irq_work entries on this cpu. Requires to be ran from hardirq
- * context with local IRQs disabled.
- */
-void irq_work_run(void)
+static void __irq_work_run(void)
{
unsigned long flags;
struct irq_work *work;
@@ -128,7 +125,6 @@ void irq_work_run(void)
if (llist_empty(this_list))
return;

- BUG_ON(!in_irq());
BUG_ON(!irqs_disabled());

llnode = llist_del_all(this_list);
@@ -155,8 +151,23 @@ void irq_work_run(void)
(void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
}
}
+
+/*
+ * Run the irq_work entries on this cpu. Requires to be ran from hardirq
+ * context with local IRQs disabled.
+ */
+void irq_work_run(void)
+{
+ BUG_ON(!in_irq());
+ __irq_work_run();
+}
EXPORT_SYMBOL_GPL(irq_work_run);

+static void irq_work_run_cpu_down(void)
+{
+ __irq_work_run();
+}
+
/*
* Synchronize against the irq_work @entry, ensures the entry is not
* currently in use.
@@ -169,3 +180,35 @@ void irq_work_sync(struct irq_work *work
cpu_relax();
}
EXPORT_SYMBOL_GPL(irq_work_sync);
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int irq_work_cpu_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+
+ switch (action) {
+ case CPU_DYING:
+ /* Called from stop_machine */
+ if (WARN_ON_ONCE(cpu != smp_processor_id()))
+ break;
+ irq_work_run_cpu_down();
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block cpu_notify;
+
+static __init int irq_work_init_cpu_notifier(void)
+{
+ cpu_notify.notifier_call = irq_work_cpu_notify;
+ cpu_notify.priority = 0;
+ register_cpu_notifier(&cpu_notify);
+ return 0;
+}
+device_initcall(irq_work_init_cpu_notifier);
+
+#endif /* CONFIG_HOTPLUG_CPU */

Attachment: stress-cpu-hotplug
Description: application/shellscript