Re: [RFC 00/12] module: avoid userspace pressure on unwanted allocations

From: Luis Chamberlain
Date: Tue Mar 28 2023 - 02:16:15 EST


On Tue, Mar 28, 2023 at 05:44:40AM +0200, David Hildenbrand wrote:
> ... do you have an updated patch/branch that includes the feedback from
> Linus so I can give it a churn tomorrow?

Yeah sure:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230327-module-alloc-opts

The commit log needs updateing to reflect the results I just collected:

With the alloc patch ("module: avoid allocation if module is already
present and ready") I see 145 MiB in memory difference in comparison
to its last patch, "module: extract patient module check into helper".
So I think that's a clear keeper and should help large CPU count boots.

The patch "module: add concurrency limiter" which puts the concurency
delimiter on the kread only saves about 2 MiB with 100 stress-ng ops,
which seems to be what I needed to reproduce your 400 CPU count original
issue.

The program used to reproduce is stress-ng with the new module option:

echo 0 > /proc/sys/vm/oom_dump_tasks
./stress-ng --module 100 --module-name xfs

To see how much max memory I use, I just use:

free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > foo.log

Run the test in another window, CTRL-C the test when above
finishes after 40 seconds and then:

sort -n -r foo.log | head -1

If you have xfs loaded already you probably wanna pick module just as big
that you don't have loaded. You must have dependencies loaded already as
it doesn't call modprobe, it just finit_module's the module.

The last patch "modules/kmod: replace implementation with a sempahore"
just takes Linus' simplification suggestion to replace kmod's solution.
I tested it with:

tools/testing/selftests/kmod/kmod.sh -t 0008

This stress tests the kmod autoloading. Using time, the timing is
similar, perhaps mildly slower, but I like the simplification. I'm still
not happy with the "module: add concurrency limiter", specially it
doesn't seem to really buy us much, 2 MiB seems like within noise.
But stress-ng clearly shows it is a possible source of issue as we
grow the ops. So it may make sense to just use the idea to replace the
delimiter for kmod for now, unless you see different results.

The stress-ng module test can at least be used now to replicate insane setups
at bootup.

Let me know if you get other useful results.

Luis