Re: [GIT PULL] remove ksys_mount() and ksys_dup()

From: Al Viro
Date: Wed Dec 18 2019 - 08:37:21 EST


On Wed, Dec 18, 2019 at 08:51:19AM +0100, Dominik Brodowski wrote:
> On Tue, Dec 17, 2019 at 10:57:43PM +0000, Al Viro wrote:
> > Seriously, these parts of init/* ought to be treated as userland code
> > that runs in kernel mode mostly because it's too much PITA to arrange
> > building a static ELF binary and linking it into the image.
>
> Well, we have had the infrastructure for fork_usermode_blob() in the kernel
> since May 2018, though it is not really used so far (the bpfilter blob is
> just reporting its existence, and not doing anything substantial). Probably,
> significant parts of init/* could be migrated to such a blob. Would that be
> an alternative generally preferred, or is its dependence on CC_CAN_LINK a
> showstopper?

Well... everything from default_rootfs/populate_rootfs call (incidentally,
stuff starting to leak into rootfs_initcall level shouldn't be mixed
with those two, but that's a separate story) and on to the end of
kernel_init_freeable() could be forked; the other bit (actual execve
attempts in the end of kernel_init()) must be in PID 1.

The problem is not just CC_CAN_LINK, it's the damn size of binaries...

> FWIW, non-initramfs boot code is considered to be "legacy" since 2.6.16, see
> filesystems/ramfs-rootfs-initramfs.txt:
>
> | Today (2.6.16), initramfs is always compiled in, but not always used. The
> | kernel falls back to legacy boot code that is reached only if initramfs does
> | not contain an /init program. The fallback is legacy code, there to ensure a
> | smooth transition and allowing early boot functionality to gradually move to
> | "early userspace" (I.E. initramfs).
> |
> | ...
> |
> | This kind of complexity (which inevitably includes policy) is rightly handled
> | in userspace. Both klibc and busybox/uClibc are working on simple initramfs
> | packages to drop into a kernel build.
> |
> | The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
> | The kernel's current early boot code (partition detection, etc) will probably
> | be migrated into a default initramfs, automatically created and used by the
> | kernel build.
>
> That plan seems to have been obsoleted long ago, right?

klibc is not in mainline and I hadn't heard of attempts to get it into the
kernel git tree for quite a few years. Whether this "just call sys_...()
instead of doing normal syscalls" is a stopgap measure until that happens
or something more permanent, the effect is the same - not poking in the
kernel internals from code with lousy test coverage...