Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

From: Christoph Lameter
Date: Mon Sep 19 2005 - 13:53:02 EST


On Mon, 19 Sep 2005, Andrew Morton wrote:

> Well. The CPU_UP_CANCELED locking in cpuup_callback() looks borked to me -
> it takes cachep->nodelists[node]->list_lock and then calls
> drain_alien_cache() which appears to take the same lock. But that's not
> the problem here.
>
> The code in cache_reap() recalculates numa_node_id() multiple times, so if
> the caller changes CPUs then this assertion will trigger. However it's
> running under keventd here, which is pinned to a single CPU. Still, it
> would be useful if you could try putting preempt_disable()s in
> cache_reap(), or change cache_reap() to evaluate numa_node_id() just the
> once, and cache that in a local variable.

drain_array_cache_locked calls check_spinlock_acquired_node which is in
turn insuring that interrupts are off. So no move to a different processor
should be possible.

However, that is contradicted by __wake_up calling
drain_array_cache_locked. The process just woke up?

> I wonder why numa_node_id() uses raw_smp_processor_id()? That's just
> asking for preempt non-atomicity bugs.

Accessing arrays indexed by node number even works if the process
continues to be executed on another node.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/