Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

From: Rasmus Villemoes
Date: Wed Mar 10 2021 - 19:18:45 EST


On 09/03/2021 23.16, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
> <linux@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
>
> Oh, and a completely unrelated second comment about this: some of the
> initramfs population code seems to be actively written to be slow.
>
> For example, I'm not sure why that write_buffer() function uses an
> array of indirect function pointer actions. Even ignoring the whole
> "speculation protections make that really slow" issue that came later,
> it seems to always have been actively (a) slower and (b) more complex.
>
> The normal way to do that is with a simple switch() statement, which
> makes the compiler able to do a much better job. Not just for the
> state selector - maybe it picks that function pointer approach, but
> probably these days just direct comparisons - but simply to do things
> like inline all those "it's used in one place" cases entirely. In
> fact, at that point, a lot of the state machine changes may end up
> turning into pure goto's - compilers are sometimes good at that
> (because state machines have often been very timing-critical).

FWIW, I tried doing

--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -401,24 +401,26 @@ static int __init do_symlink(void)
return 0;
}

-static __initdata int (*actions[])(void) = {
- [Start] = do_start,
- [Collect] = do_collect,
- [GotHeader] = do_header,
- [SkipIt] = do_skip,
- [GotName] = do_name,
- [CopyFile] = do_copy,
- [GotSymlink] = do_symlink,
- [Reset] = do_reset,
-};
-
static long __init write_buffer(char *buf, unsigned long len)
{
+ int ret;
+
byte_count = len;
victim = buf;

- while (!actions[state]())
- ;
+ do {
+ switch (state) {
+ case Start: ret = do_start(); break;
+ case Collect: ret = do_collect(); break;
+ case GotHeader: ret = do_header(); break;
+ case SkipIt: ret = do_skip(); break;
+ case GotName: ret = do_name(); break;
+ case CopyFile: ret = do_copy(); break;
+ case GotSymlink: ret = do_symlink(); break;
+ case Reset: ret = do_reset(); break;
+ }
+ } while (!ret);
+
return len - byte_count;
}


and yes, all the do_* functions get inlined into write_buffer with some
net space saving:

add/remove: 0/9 grow/shrink: 1/0 up/down: 1696/-2112 (-416)
Function old new delta
write_buffer 100 1796 +1696
actions 32 - -32
do_start 52 - -52
do_reset 112 - -112
do_skip 128 - -128
do_collect 144 - -144
do_symlink 172 - -172
do_copy 304 - -304
do_header 572 - -572
do_name 596 - -596
Total: Before=5360, After=4944, chg -7.76%

(ppc32 build). But the unpacking still takes the same time. It might be
different on x86.

Rasmus