Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

From: Petr Vandrovec
Date: Thu Sep 22 2005 - 16:26:19 EST


Andrew Morton wrote:
Christoph Lameter <clameter@xxxxxxxxxxxx> wrote:

On Wed, 21 Sep 2005, Christoph Lameter wrote:

> Hmm. This likely has something to do with debugging code. I was unable to > reproduce this on amd64 with your config. I get another failure with > 2.6.14-rc2 on ia64 if I enable all the debugging features that you have. > The system works fine if no debugging is configured:
> > kernel BUG at kernel/workqueue.c:541!
> swapper[1]: bugcheck! 0 [1]

I fixed the above issue (a structure became larger than the maximum allowed by the slab allocator) and the kernel boots fine now on an 8 way ia64. Cannot reproduce the problem.


Petr can. I think we're still waiting for him to test the below (please):

Sorry, I've missed that half of email completely. Yes, it seems to fix problem,
box has currently 8 min uptime, which is 7:55 more than it survived before.
Thanks,
Petr Vandrovec

Could you try the following patch:

---

The numa slab allocator may allocate pages from foreign nodes onto the lists
for a particular node if a node runs out of memory. Inspecting the slab->nodeid
field will not reflect that the page is now in use for the slabs of another node.

This patch fixes that issue by adding a node field to free_block so that the caller
can indicate which node currently uses a slab.

Also removes the check for the current node from kmalloc_cache_node since the
process may shift later to another node which may lead to an allocation on another
node than intended.

Signed-off-by: Christoph Lameter <clameter@xxxxxxx>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/