Re: [PATCH] capabilities: new kernel.ns_modules_allowed sysctl

From: Vegard Nossum
Date: Wed Aug 10 2022 - 04:26:21 EST



On 8/10/22 00:56, Kees Cook wrote:
> On Tue, Aug 09, 2022 at 08:52:29PM +0200, Vegard Nossum wrote:
>> Creating a new user namespace grants you the ability to reach a lot of code
>> (including loading certain kernel modules) that would otherwise be out of
>> reach of an attacker. We can reduce the attack surface and block exploits
>> by ensuring that user namespaces cannot trigger module (auto-)loading.
>>
>> A cursory search of exploits found online yields the following extremely
>> non-exhaustive list of vulnerabilities, and shows that the technique is
>> both old and still in use:
>>
>> - CVE-2016-8655
>> - CVE-2017-1000112
>> - CVE-2021-32606
>> - CVE-2022-2588
>> - CVE-2022-27666
>> - CVE-2022-34918
>>
>> This patch adds a new sysctl, kernel.ns_modules_allowed, which when set to
>> 0 will block requests to load modules when the request originates in a
>> process running in a user namespace.
>>
>> For backwards compatibility, the default value of the sysctl is set to
>> CONFIG_NS_MODULES_ALLOWED_DEFAULT_ON, which in turn defaults to 1, meaning
>> there should be absolutely no change in behaviour unless you opt in either
>> at compile time or at runtime.
>>
>> This mitigation obviously offers no protection if the vulnerable module is
>> already loaded, but for many of these exploits the vast majority of users
>> will never actually load or use these modules on purpose; in other words,
>> for the vast majority of users, this would block exploits for the above
>> list of vulnerabilities.
>
> We've needed better module autoloading protections for a long time[1].
> This patch is a big hammer ("all user namespaces"), so I worry it
> wouldn't actually get used much.
>
> Here's a pointer into a prior thread, where Linus chimed in[2].
> I replied back then, but I'm not sure I agree with my 2017 self any
> more. :P
>
> It really does feel like the loading decisions need to be made by the
> userspace helper, which currently doesn't have enough information to
> make those choices.
>
> -Kees
>
> [1] https://github.com/KSPP/linux/issues/24
> [2] https://lore.kernel.org/kernel-hardening/CA+55aFxiDKfe6VCM+aV2OgnkzMpP+iz+rn2k25_Qa_QLex=pPQ@xxxxxxxxxxxxxx/

Thanks for the pointers, I didn't have any of this context.

I would still argue for my patch with the following points:

1) As you said, it's been almost 7 years since the discussion you linked
and apparently it's still a problem (including those 5 privilege
escalation CVEs from my changelog); this relatively simple patch
provides a mitigation _today_

2) it can be layered with any other future mitigations if they do show up

3) it's not as big a hammer as completely disabling unprivileged user
namespaces, which seems to be the next best thing currently in terms of
protecting your users (as a distro)

4) both the implementation and the user interface are fairly simple in
my patch, which means it's not a huge long term maintenance burden like
block-/allowlists or capabilities based on whether modules are
maintained or not (I would also argue that "maintained or not" is not a
great proxy for whether there are security issues in the code)

5) it resembles other sysctls like unprivileged_bpf_disabled or
perf_event_paranoid, or even modules_disabled

6) it's opt-in by default, and even then, if you run into problems with
containers that don't work or whatever, the solution is extremely
simple: just load the modules you need before starting your container
(the module names are printed in the kernel log so it shouldn't be
difficult to track down issues)

What's the downside..?


Vegard