Re: BUG: spinlock lockup/wrong CPU/recursion -- when readingnuma_maps on 2.6.17-rc1

From: Andrew Morton
Date: Tue Apr 18 2006 - 00:18:03 EST


Sonny Rao <sonny@xxxxxxxxxxx> wrote:
>
> Hi, I ran into a deadlock on 2.6.16-mm2 when I was running a
> multi-threaded application and was reading /proc/<pid>/numa_maps for
> the app.
>
> I recompiled with DEBUG_SPINLOCK and I can get various error messages
> on kernels ranging from 2.6.17-rc1 to serveral mm kernels including
> 2.6.16-mm[12] and 2.6.16-rc5-mm[23] (mm kernels before this seem to break
> a lot on my box)
>
> My current guess, based on my rudimentary understanding of the code, is
> that we are rescheduling while holding a spinlock in
> check_pte_range() which is called from show_numa_map() in mempolicy.c.
>
> Specifically, the gather_stats() function which is called inside
> check_pte_range() has a cond_resched() at the end. Maybe that line
> should be changed to cond_resched_lock() or should simply be removed.
>
> I'll try removing it and see what happens.

Yes, that's a bug and that cond_resched() needs to go.

We would have found this quite quickly if cond_resched() had a
might_sleep() in it. It really should have such a check, but we cannot do
this because in some configurations, might_sleep() calls cond_resched().
That was rather nasty or us. Ingo, can you think of a fix please?

This bug would also have been exposed as a scheduling-while-atomic warning
on those rare occasions when the cond_resched() actually calls schedule().
But that won't be enabled unless the NUMA guys actually test with all debug
options, as I repeatedly and apparently ineffectively have suggested.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/