On 5/24/23 23:36, Luis Chamberlain wrote:
Add support to use the new kread_uniq_fd() to avoid duplicate kernel
reads on modules. At the cost of about ~945 bytes to your kernel size,
enabling this on a 255 CPU x86_64 qemu guest this saves about ~1.8 GiB
of memory during boot which would otherwise be free'd, and reduces boot
time by about ~11 seconds.
Userspace loads modules through finit_module(), this in turn will
use vmalloc space up to 3 times:
a) The kernel_read_file() call
b) Optional module decompression
c) Our final copy of the module
Commit 064f4536d139 ("module: avoid allocation if module is already
present and ready") shows a graph of the amount of vmalloc space
observed allocated but freed for duplicate module request which end
up in the trash bin. Since there is a linear relationship with the
number of CPUs eventually this will bite us and you end up not being
able to boot. That commit put a stop gap for c) but to avoid the
vmalloc() space wasted on a) and b) we need to detect duplicates
earlier.
We could just have userspace fix this, but as reviewed at LSFMM 2023
this year in Vancouver, fixing this in userspace can be complex and we
also can't know when userpace is fixed. Fixing this in kernel turned
out to be easy with the inode and with a simple kconfig option we can
let users / distros decide if this full stop gap is worthy to enable.
kmod normally uses finit_module() only if a module is not compressed,
otherwise it decompresses it first and then invokes init_module().
Looking at Fedora and openSUSE Tumbleweed, they compress kernel modules
with xz and zstd, respectively. They also have their kernels built
without any CONFIG_MODULE_COMPRESS_{GZIP,XZ,ZSTD} options.
It means that these and similarly organized distributions end up using
init_module(), and adding complexity to optimize finit_module() wouldn't
actually help in their case.
-- Petr