Re: [PATCH] capabilities: new kernel.ns_modules_allowed sysctl
From: Kees Cook
Date: Fri Aug 12 2022 - 14:48:45 EST
On Wed, Aug 10, 2022 at 10:25:17AM +0200, Vegard Nossum wrote:
>
> On 8/10/22 00:56, Kees Cook wrote:
> > On Tue, Aug 09, 2022 at 08:52:29PM +0200, Vegard Nossum wrote:
> >> Creating a new user namespace grants you the ability to reach a lot of code
> >> (including loading certain kernel modules) that would otherwise be out of
> >> reach of an attacker. We can reduce the attack surface and block exploits
> >> by ensuring that user namespaces cannot trigger module (auto-)loading.
> >>
> >> A cursory search of exploits found online yields the following extremely
> >> non-exhaustive list of vulnerabilities, and shows that the technique is
> >> both old and still in use:
> >>
> >> - CVE-2016-8655
> >> - CVE-2017-1000112
> >> - CVE-2021-32606
> >> - CVE-2022-2588
> >> - CVE-2022-27666
> >> - CVE-2022-34918
> >>
> >> This patch adds a new sysctl, kernel.ns_modules_allowed, which when set to
> >> 0 will block requests to load modules when the request originates in a
> >> process running in a user namespace.
> >>
> >> For backwards compatibility, the default value of the sysctl is set to
> >> CONFIG_NS_MODULES_ALLOWED_DEFAULT_ON, which in turn defaults to 1, meaning
> >> there should be absolutely no change in behaviour unless you opt in either
> >> at compile time or at runtime.
> >>
> >> This mitigation obviously offers no protection if the vulnerable module is
> >> already loaded, but for many of these exploits the vast majority of users
> >> will never actually load or use these modules on purpose; in other words,
> >> for the vast majority of users, this would block exploits for the above
> >> list of vulnerabilities.
> >
> > We've needed better module autoloading protections for a long time[1].
> > This patch is a big hammer ("all user namespaces"), so I worry it
> > wouldn't actually get used much.
> >
> > Here's a pointer into a prior thread, where Linus chimed in[2].
> > I replied back then, but I'm not sure I agree with my 2017 self any
> > more. :P
> >
> > It really does feel like the loading decisions need to be made by the
> > userspace helper, which currently doesn't have enough information to
> > make those choices.
> >
> > -Kees
> >
> > [1] https://github.com/KSPP/linux/issues/24
> > [2] https://lore.kernel.org/kernel-hardening/CA+55aFxiDKfe6VCM+aV2OgnkzMpP+iz+rn2k25_Qa_QLex=pPQ@xxxxxxxxxxxxxx/
>
> Thanks for the pointers, I didn't have any of this context.
>
> I would still argue for my patch with the following points:
>
> 1) As you said, it's been almost 7 years since the discussion you linked
> and apparently it's still a problem (including those 5 privilege
> escalation CVEs from my changelog); this relatively simple patch
> provides a mitigation _today_
>
> 2) it can be layered with any other future mitigations if they do show up
>
> 3) it's not as big a hammer as completely disabling unprivileged user
> namespaces, which seems to be the next best thing currently in terms of
> protecting your users (as a distro)
>
> 4) both the implementation and the user interface are fairly simple in
> my patch, which means it's not a huge long term maintenance burden like
> block-/allowlists or capabilities based on whether modules are
> maintained or not (I would also argue that "maintained or not" is not a
> great proxy for whether there are security issues in the code)
>
> 5) it resembles other sysctls like unprivileged_bpf_disabled or
> perf_event_paranoid, or even modules_disabled
>
> 6) it's opt-in by default, and even then, if you run into problems with
> containers that don't work or whatever, the solution is extremely
> simple: just load the modules you need before starting your container
> (the module names are printed in the kernel log so it shouldn't be
> difficult to track down issues)
>
> What's the downside..?
I agree, it'd be nice to have. I'm just trying to predict what kind of
push-back there may be.
Can you address the build failures noted on the thread, and send a v2? I
note that after this patch it looks like all module loading from a userns
gets logged, regardless of the setting. Is that intended?
-Kees
--
Kees Cook