Re: [GIT PULL] scheduler fixes

From: Pekka Enberg
Date: Mon May 25 2009 - 14:43:44 EST

Hi Linus,

Linus Torvalds wrote:
On Mon, 25 May 2009, Pekka J Enberg wrote:
diff --git a/init/main.c b/init/main.c
index 33ce929..fb0e004 100644
--- a/init/main.c
+++ b/init/main.c
@@ -576,6 +576,22 @@ asmlinkage void __init start_kernel(void)
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
+ build_all_zonelists();
+ page_alloc_init();
+ printk(KERN_NOTICE "Kernel command line: %s\n", boot_command_line);
+ parse_early_param();
+ parse_args("Booting kernel", static_command_line, __start___param,
+ __stop___param - __start___param,
+ &unknown_bootoption);
+ pidhash_init();
+ vmalloc_init();
+ vfs_caches_init_early();
+ /*
+ * Set up kernel memory allocators
+ */
+ mem_init();
+ kmem_cache_init();

So what strikes me is a question:

- why do we want to do pidhash_init and vfs_caches_init_early() so early?

Yes, pidhash_init() now uses alloc_bootmem. It's an allocation that is not trivially small, but it's not humongous either (max 4096 hash list heads, one pointer each).

I can certainly fix that up to use kmalloc() or vmalloc(). I moved it because I wasn't sure how much it's actually allocating and wanted to do the conservative thing here.

Linus Torvalds wrote:
And vfs_caches_init_early() is actually doing some rather strange things, like doing a "alloc_large_system_hash()" but not unconditionally: it does it in the "late" initialization too, if not done early. inode_init_early does soemthing very similar (ie a _conditional_ early init).

So none of this seems to really get a huge advantage from the early init. There seems to be some subtle NUMA issues, but do we really want that? I get the feeling that nobody ever wanted to do it early, and then the NUMA people said "I don't wnt to do this early, but I don't want to touch the non-NUMA case, so I'll do it early for non-numa, and late for numa".

SLUB does sysfs setup in kmem_cache_init() and if I saw some oopses if I don't call vfs_caches_init_early() first. I didn't look too closely, though.

Linus Torvalds wrote:
I'm also not entirely sure we really need to do vmalloc_init() that early, but I dunno. It also uses alloc_bootmem().

We can do that later but then we need to fix up vmalloc_init(). There's actually a patch floating around to do that.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at