Re: [PATCH][GIT PULL] tracing: Fix compile issue fortrace_sched_wakeup.c

From: Jason Baron
Date: Thu Oct 21 2010 - 10:01:23 EST


On Thu, Oct 21, 2010 at 11:58:56AM +0900, Masami Hiramatsu wrote:
> (2010/10/21 1:43), Jason Baron wrote:
> > On Wed, Oct 20, 2010 at 05:40:45PM +0200, Ingo Molnar wrote:
> >> FYI, there's a new mystery hang (sometimes crash) that triggers in -tip - and which
> >> seems to be tracing related. See the crashlog below - config attached.
> >>
> >> It's not bisectable - small changes in the kernel make the bug come/go. (might be a
> >> race of some sorts)
> >>
> >> Thanks,
> >>
> >> Ingo
> >>
> >
> > strange b/c it looks like we get though enabling/disabling the
> > tracepoitns individually, but then when we go to enable all the
> > tracepoints we hit this hang - perhaps, suggesting a race. Do we always
> > fail after "Testing all events:" is printed? Does the crash have any
> > more clues. I will try and re-produce this.
> >
> > Also, I noticed some recent changes to text_poke_smp() usage of
> > stop_machine() on Oct. 14th. That's related to the area where this appears
> > to hang, so if things were working with this .config before then, that
> > might be a place to look. Adding Masami to the 'cc list.
>
> Recent changes of text_poke_smp() just removed unnecessary
> get/put_online_cpu(), so I think it's not related this bug.
>
> It seems there can be a bug in stop_machine() routine under
> heavy use. usually that is called just once at a time, but jump
> label and optprobe might call it heavily (thousands times?).
> So some racy situation can be happen easily.
>

for most tracepoints there is 1 text location that needs to be
updated...however, I know that for kmalloc, you can end up with
hundredds or even thousands. So yes, we can end up calling
stop_machine() thousands of times.

There is a patch to reduce kmalloc tracepoint text locations by moving
them out of line: http://lkml.org/lkml/2010/10/13/208

Also, text_poke_smp_batch() would allow us to update all these text
locations at once.

Nonetheless, there appears to be a underlying race condition...

thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/