Re: [git pull] m68k SLUB fix for 2.6.39

From: Michael Schmitz
Date: Fri Apr 29 2011 - 19:36:37 EST


Geert Uytterhoeven wrote:
On Thu, Apr 28, 2011 at 23:41, David Rientjes <rientjes@xxxxxxxxxx> wrote:
On Thu, 28 Apr 2011, James Bottomley wrote:



I think what the N_NORMAL_MEMORY patch did is just make it take a whiile
before you start allocating from that range. Try executing a memory
balloon on the platform; that was how we first demonstrated the problem
on parisc.

With parisc, you encountered an oops in add_partial() because the
kmem_cache_node structure for the memory range returned by page_to_nid()
was not allocated. init_kmem_cache_nodes() takes care of this for all
memory ranges set in N_NORMAL_MEMORY.

Adding Christoph and Pekka to the cc if there is additional concerns about
slub on this architecture.

My ARAnyM instance has

System Memory: 276480K
14 MB at 0x00000000 (ST-RAM)
256 MB at 0x01000000 (alternate RAM)

and 137800KIB of swap, and survived the following program just fine:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[])
{
size_t size = 1048576;
size_t total = 0;
void *p;

while (size) {
p = malloc(size);
if (!p) {
printf("Failed to allocate %zu bytes\n", size);
size /= 2;
}
memset(p, 0xaa, size);
total += size;
printf("Using %zu / 0x%zx bytes of memory\n", total, total);
}

printf("Finished!\n");
return 0;
}

i.e. the OOM-killer just killed the program after it consumed all
available virtual
memory:

Out of memory: Kill process 1727 (malloctest) score 854 or sacrifice child
Killed process 1727 (malloctest) total-vm:361160kB, anon-rss:224164kB,
file-rss:0kB
malloctest: page allocation failure. order:0, mode:0x84d0

So SLUB really seems to work now.
Forgot to mention what I did for tests, on all kernels that I could actually boot: I ran slabinfo -l and slabinfo -T (saved the output in case anyone wants to analyze that), any kernel that survived this was considered good in the original bisect. The current fix was also tested on the actual hardware.

There were quite a few kernels that initially booted but died on the first slabinfo invocation. They invariably died at the e2fsck stage when rebooted after that.

Using your test, ARAnyM, 14MB/128MB RAM and 134MB swap:

Out of memory: Kill process 1376 (malloctest) score 944 or sacrifice child
Killed process 1376 (malloctest) total-vm:272756kB, anon-rss:135232kB, file-rss:0kB

Falcon CT60, 14MB/512MB no swap:

Out of memory: Kill process 8644 (malloctest) score 967 or sacrifice child
Killed process 8644 (malloctest) total-vm:512244kB, anon-rss:510088kB, file-rss:284kB

HTH,

Michael


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/