Re: module boot time (was Re: [PATCH] module: Use binary search inlookup_symbol())
From: Tim Bird
Date: Fri May 20 2011 - 17:30:07 EST
On 05/19/2011 12:56 PM, Jeff Mahoney wrote:
> On 05/18/2011 05:34 PM, Greg KH wrote:
>> I don't think that's worth it, there has been talk, and some initial
>> code, about adding kernel modules to the kernel image itself, which
>> would solve a lot of the i/o time of loading modules, and solves some
>> other boot speed problems. That work was done by Jeff Mahoney, but I
>> think he ran into some "interesting" linker issues and had to shelve it
>> due to a lack of time :(
>
> I had a few attempts at this before I had to move on to other things. I
> haven't gotten a chance to take another look.
>
> I had two approaches:
>
> 1) Statically link in modules post-build. This actually worked but had
> some large caveats. The first was that an un-relocated image (vmlinux.o)
> was needed in order to make it work and a ~ 200 MB footprint to gain a
> fairly small win in boot time didn't seem like a good tradeoff. The
> other issue was more important and is what made me abandon this
> approach: If the entire image is re-linked then the debuginfo package
> that we as a distributor offer but don't typically install becomes
> invalid. Our support teams would not be too thrilled with the idea of
> crash dumps that can't be used.
>
> 2) Build a "megamodule" that is loaded like an initramfs but is already
> internally linked and requires no additional userspace. I got the
> megamodule creation working but didn't get the loading aspect of it done
> yet.
>
> In both cases, I added the regular initcall sections to the modules in
> addition to the module sections so they'd be loaded in the order they
> would have been if they were actually statically linked.
>
> I hadn't thought about it until now and it may not actually work, but it
> could be possible to use the megamodule approach *and* link it into a
> static vmlinux image as an appended section that's optionally used.
What was the use case for this? My use case is that I want
to use all the modules compiled into the kernel, but I don't
want to run some modules' initcalls until well after kernel
and user-space startup.
My solution is very simple - create a new initcall macro for
the initcalls I want to defer, along with a new 'deferred' initcall
section where the function entries can be accumulated. Then,
I avoid freeing init memory at standard initcall time. Once
the main user-space has initialized, it echos to a /proc file
to cause the deferred initcalls to be called, and the init memory
to be freed.
I'm attaching the patch below, because it's short enough to
see what's going on without a lot of digging.
This method eliminates the linking cost for module loading,
saves the memory overhead of the ELF module format, and gives
me control over when the deferred modules will be initialized.
The big downside is that you have to manually change the definition
for the initcall from 'module_init' to 'deferred_module_init'
for the modules you want to defer. Maybe there's a simple way
to control this with a kernel config? That would make this a pretty
nice, generic, system for deferring module initialization, IMHO.
If your use case is that you want all the modules present, but
want to initialize only some of them later, then maybe a list of
module names could be passed into the /proc interface, and the
routine could selectively initialize the deferred modules.
Patch (for 2.6.27 I believe) follows. This is for discussion
only, I wouldn't expect it to apply to mainline.
commit 1fab0d6a932d000780cd232b7d10ebfbe69f477c
Author: Tim Bird <tim.bird@xxxxxxxxxxx>
Date: Fri Sep 12 11:31:52 2008 -0700
Add deferred_module_init
This allows statically linked modules to be initialized sometime after
the initial bootstrap. To do this, change the module_init() macro
to deferred_module_init(), for those init routines you want to defer.
Signed-off-by: Tim Bird <tim.bird@xxxxxxxxxxx>
diff --git a/arch/x86/kernel/vmlinux_32.lds.S b/arch/x86/kernel/vmlinux_32.lds.S
index a9b8560..f5bdfc4 100644
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -140,11 +140,21 @@ SECTIONS
*(.con_initcall.init)
__con_initcall_end = .;
}
+ .deferred_initcall.init : AT(ADDR(.deferred_initcall.init) - LOAD_OFFSET) {
+ __def_initcall_start = .;
+ *(.deferred_initcall.init)
+ __def_initcall_end = .;
+ }
.x86_cpu_dev.init : AT(ADDR(.x86_cpu_dev.init) - LOAD_OFFSET) {
__x86_cpu_dev_start = .;
*(.x86_cpu_dev.init)
__x86_cpu_dev_end = .;
}
+ .x86cpuvendor.init : AT(ADDR(.x86cpuvendor.init) - LOAD_OFFSET) {
+ __x86cpuvendor_start = .;
+ *(.x86cpuvendor.init)
+ __x86cpuvendor_end = .;
+ }
SECURITY_INIT
. = ALIGN(4);
.altinstructions : AT(ADDR(.altinstructions) - LOAD_OFFSET) {
diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c
index 59ea42e..a247a8e 100644
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -703,6 +703,22 @@ static int execdomains_read_proc(char *page, char **start, off_t off,
return proc_calc_metrics(page, start, off, count, eof, len);
}
+extern void do_deferred_initcalls(void);
+
+static int deferred_initcalls_read_proc(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ static int deferred_initcalls_done = 0;
+ int len;
+
+ len = sprintf(page, "%d\n", deferred_initcalls_done);
+ if (! deferred_initcalls_done) {
+ do_deferred_initcalls();
+ deferred_initcalls_done = 1;
+ }
+ return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
#ifdef CONFIG_PROC_PAGE_MONITOR
#define KPMSIZE sizeof(u64)
#define KPMMASK (KPMSIZE - 1)
@@ -855,6 +871,7 @@ void __init proc_misc_init(void)
{"filesystems", filesystems_read_proc},
{"cmdline", cmdline_read_proc},
{"execdomains", execdomains_read_proc},
+ {"deferred_initcalls", deferred_initcalls_read_proc},
{NULL,}
};
for (p = simple_ones; p->name; p++)
diff --git a/include/linux/init.h b/include/linux/init.h
index ad63824..ef61767 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -200,6 +200,7 @@ extern void (*late_time_init)(void);
#define device_initcall_sync(fn) __define_initcall("6s",fn,6s)
#define late_initcall(fn) __define_initcall("7",fn,7)
#define late_initcall_sync(fn) __define_initcall("7s",fn,7s)
+//#define deferred_initcall(fn) __define_initcall("8",fn,8)
#define __initcall(fn) device_initcall(fn)
@@ -214,6 +215,10 @@ extern void (*late_time_init)(void);
static initcall_t __initcall_##fn \
__used __section(.security_initcall.init) = fn
+#define deferred_initcall(fn) \
+ static initcall_t __initcall_##fn \
+ __used __section(.deferred_initcall.init) = fn
+
struct obs_kernel_param {
const char *str;
int (*setup_func)(char *);
@@ -254,6 +259,7 @@ void __init parse_early_param(void);
* be one per module.
*/
#define module_init(x) __initcall(x);
+#define deferred_module_init(x) deferred_initcall(x);
/**
* module_exit() - driver exit entry point
diff --git a/init/main.c b/init/main.c
index 27f6bf6..e4bbdb2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -789,12 +789,40 @@ static void run_init_process(char *init_filename)
kernel_execve(init_filename, argv_init, envp_init);
}
+extern initcall_t __def_initcall_start[], __def_initcall_end[];
+
+/* call deferred init routines */
+void do_deferred_initcalls(void)
+{
+ initcall_t *call;
+ static int already_run=0;
+
+ if (already_run) {
+ printk("do_deferred_initcalls() has already run\n");
+ return;
+ }
+
+ already_run=1;
+
+ printk("Running do_deferred_initcalls()\n");
+
+ lock_kernel(); /* make environment similar to early boot */
+
+ for(call = __def_initcall_start; call < __def_initcall_end; call++)
+ do_one_initcall(*call);
+
+ flush_scheduled_work();
+
+ free_initmem();
+ unlock_kernel();
+}
+
/* This is a non __init function. Force it to be noinline otherwise gcc
* makes it inline to init() and it becomes part of init.text section
*/
static int noinline init_post(void)
{
- free_initmem();
+ //free_initmem();
unlock_kernel();
mark_rodata_ro();
system_state = SYSTEM_RUNNING;
=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/