Re: [RFC PATCH 0/6] module, kbuild: Faster boot with custom kernel.
From: Kay Sievers
Date: Thu Feb 19 2009 - 15:48:50 EST
On Thu, Feb 19, 2009 at 12:41, Kay Sievers <kay.sievers@xxxxxxxx> wrote:
> On Thu, Feb 19, 2009 at 12:15, Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote:
>> On Thursday 19 February 2009 00:27:58 Kay Sievers wrote:
>>> some modules wait for 200-500 milliseconds to
>>> get the lock released, some larger modules spend 50 milliseconds in
>>> load_module(), many of them around 20 milliseconds.
>>
>> OK, this is an untested hack (don't try unloading modules, not sure symbol
>> resolution isn't racy now I've killed the lock). Does it change the numbers?
>
> That changes it dramatically. The numbers from the sycall until the
> linked-in module are now down to 15-25 milliseconds, and for a few
> large modules 50-100.
>
> (One crazy exception is ipv6, which takes 620 milliseconds to link, no
> idea what it needs to do.)
Sorry, this was caused by I/O wait from disk for that huge module, and
gets to reasonable numbers by putting all modules into RAM before
loading them.
> I'll compare a few bootup times with and without the patch, and come
> back later today with the real numbers.
The whole massive parallel modprobe happens during udev coldplug. I
tried a 2GHz Dual Core laptop, and a setup without initramfs here,
which loads ~40 modules. All the kernel modules are copied to a ramfs
mount before the coldplug is started.
I measured the time from the first modprobe that happened in the
kernel to the loading of "dummy", which I manually trigger from the
udev boot script, and which gets called right after udev has settled
and handled all events.
With the mutex it takes 1.8 seconds, without it, it takes 1.3 seconds.
If I comment out the creation of the stop_machine() threads, it gets
down to 1.1 seconds.
With the mutex, I see code waiting for up to 180 milliseconds waiting
for the mutex, the average between 20-40 milliseconds.
Without the mutex the largest time to link is 30 milliseconds, and
most of them are around 5-10 milliseconds.
Without the mutex and the stop_machine() creation, the flow of tracing
output looks like real work and, depending on the actual module, they
spend time in various stages of linking, relocation and so on, there
are no long delays for any of the modules, like I see with the current
code.
It would be great, if we can safely minimize the time spent in the mutex.
Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/