Re: [PATCH] kthread: Prevent unpark race which puts threads on thewrong cpu

From: Thomas Gleixner
Date: Wed Apr 10 2013 - 06:51:37 EST


On Wed, 10 Apr 2013, Thomas Gleixner wrote:
> On Tue, 9 Apr 2013, Dave Hansen wrote:
>
> > On 04/09/2013 12:30 PM, Thomas Gleixner wrote:
> > > On Tue, 9 Apr 2013, Thomas Gleixner wrote:
> > > Thought more about it and found, that the stupid binding only works
> > > when the task is really descheduled. So there is a small window left,
> > > which could lead to this. Revised patch below.
> > >
> > > Anyway a trace for that issue would be appreciated nevertheless.
> >
> > Here you go:
> >
> > http://sr71.net/~dave/linux/bigbox.1365539189.txt.gz
>
> Hmm. Unfortunately migration/146 is not in the trace.
>
> Can you please apply the patch below? That avoids the oops, but might
> hang an online operation. Though the machine should stay up and you
> should be able to retrieve the trace.
>
> Thanks,
>
> tglx
> ---
> Index: linux-2.6/kernel/smpboot.c
> ===================================================================
> --- linux-2.6.orig/kernel/smpboot.c
> +++ linux-2.6/kernel/smpboot.c
> @@ -131,7 +131,10 @@ static int smpboot_thread_fn(void *data)
> continue;
> }
>
> - BUG_ON(td->cpu != smp_processor_id());
> + if (td->cpu != smp_processor_id()) {
> + tracing_off();
> + schedule();

Bah, that wants a continue. Revised patch below.

> + }
>
> /* Check for state change setup */
> switch (td->status) {

Index: linux-2.6/kernel/smpboot.c
===================================================================
--- linux-2.6.orig/kernel/smpboot.c
+++ linux-2.6/kernel/smpboot.c
@@ -131,7 +131,11 @@ static int smpboot_thread_fn(void *data)
continue;
}

- BUG_ON(td->cpu != smp_processor_id());
+ if (td->cpu != smp_processor_id()) {
+ tracing_off();
+ schedule();
+ continue;
+ }

/* Check for state change setup */
switch (td->status) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/