Re: [PATCH v4 1/1] exec: seal system mappings

From: Jeff Xu
Date: Mon Jan 13 2025 - 16:28:15 EST


On Mon, Jan 6, 2025 at 5:12 PM Kees Cook <kees@xxxxxxxxxx> wrote:
>
> On Fri, Jan 03, 2025 at 09:38:10PM +0000, Lorenzo Stoakes wrote:
> > On Tue, Dec 17, 2024 at 02:18:53PM -0800, Kees Cook wrote:
> > > On Mon, Nov 25, 2024 at 08:20:21PM +0000, jeffxu@xxxxxxxxxxxx wrote:
> > > > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> > > >
> > > > Those mappings are readonly or executable only, sealing can protect
> > > > them from ever changing or unmapped during the life time of the process.
> > > > For complete descriptions of memory sealing, please see mseal.rst [1].
> > > >
> > > > System mappings such as vdso, vvar, and sigpage (for arm) are
> > > > generated by the kernel during program initialization, and are
> > > > sealed after creation.
> > > > [...]
> > > >
> > > > + exec.seal_system_mappings = [KNL]
> > > > + Format: { no | yes }
> > > > + Seal system mappings: vdso, vvar, sigpage, vsyscall,
> > > > + uprobe.
> > > > + - 'no': do not seal system mappings.
> > > > + - 'yes': seal system mappings.
> > > > + This overrides CONFIG_SEAL_SYSTEM_MAPPINGS=(y/n)
> > > > + If not specified or invalid, default is the value set by
> > > > + CONFIG_SEAL_SYSTEM_MAPPINGS.
> > > > + This option has no effect if CONFIG_64BIT=n
> > >
> > > I know there is a v5 coming, but I wanted to give my thoughts to help
> > > shape it based on the current discussion threads.
> > >
> > > The callers of _install_special_mapping() cover what is mentioned here.
> > > The vdso is very common (arm, arm64, csky, hexagon, loongarch, mips,
> > > parisc, powerpc, riscv, s390, sh, sparc, x86, um). For those with vdso,
> > > some also have vvar (arm, arm64, loongarch, mips, powerpc, riscv, s390,
> > > sparc, x86). After that, I see a few extra things, in addition to
> > > sigpage and uprobes as mentioned already in the patch:
> > >
> > > arm sigpage
> > > arm64 compat vectors (what is this for arm?)
> > > arm64 compat sigreturn (what is this for arm?)
> > > nios2 kuser helpers
> > > uprobes
> >
> > OK let's not get ahead of ourselves :)
> >
> > VDSOs/gate VMAs are treated quite differently by different arches. So we
> > have to tread _very_ carefully here.
> >
> > I believe PPC doe some 'tricky' things and may actually want to unmap, for
> > instance.
> >
> > The problem with this kind of change is we're doing something fundamental
> > that impacts _every possible combinatorial combination of configs, arches,
> > and use cases_ for each of these which we seeming - just assume - will have
> > no issue with this.
> >
> > This is insufficient, deeply. We need:
> >
> > 1. Strong justification (hand waving won't suffice).
> > 2. Very extensive testing and checking, and _proof_ of this testing being
> > performed.
> > 3. Buy-in from arch maintainers.
> >
> > So far this series has provided none of those. This is why I am cautious
> > and pushing back here.
>
> Sure, I agree. This is why I was suggested the ...ARCH_HAS... Kconfig.
> That will provide the way for 3) to happen. 1) just needs a little more
> details in the commit log, I guess? The goal is attack surface reduction
> in userspace, and remapping shenanigans have become a recent avenue of
> attack.
>
> For 2) there are limits. As you say we may have "every possible
> combinatorial combination", which may not be feasible to test. But
> making it available for the common cases (and of course testing those)
> makes sense.
>
> > And I absolutely will not accept a user being able to turn on a switch in a
> > known-broken configuration. This is absolutely unacceptable.
>
> Sure, of course.
>
> > It's equally unacceptable for a user to enable a feature that is
> > untested/confirmed on an architecture.
>
> Agreed.
>
> > So let's be careful about Linus's edict here - the operative part being 'if
> > it doesn't break things'.
>
> Right -- I should clarify: I don't mean to say "it should be enabled by
> default", I meant to say that we have a common pattern for making these
> kinds of features available without hiding them behind a build-time
> Kconfig that would have put the features out of reach for system owners
> that only use distro kernels, etc. I was pushing back on an earlier
> comment that I interpreted as rejecting boot params. A boot param (when
> other aspects of the system are sane) is needed for this kind of thing,
> and is the pattern we use for providing optional features that distros
> can make available without enabling them by default.
>
> > > So, if we want to have a CONFIG_MSEAL_SYSTEM_MAPPINGS at all, it should
> > > be "default y" since we have the ...ARCH_HAS... config already, and then
> > > add a CONFIG_MSEAL_SYSTEM_MAPPINGS_DEFAULT that is off by default (since
> > > we expect there may be userspace impact) and tie _that_ to the kernel
> > > command-line so that end users can use it, or system builders can enable
> > > CONFIG_MSEAL_SYSTEM_MAPPINGS_DEFAULT.
> >
> > Again, I hate to push on this, but I am simply not going to allow users to
> > enable features we know break things.
> >
> > Users might not be aware this feature is broken for CRIU, and X, and Y and
> > whatever else we've not thought about and enable it thinking it helps
> > security, and end up with a broken system.
>
> This will never be a bright line, and I think choice is more important.
> For example, Ubuntu builds with CRIU, but only a tiny set of tools
> actually use it. (I've actually been considering adding a boot param to
> disable CRIU features since they undermine some aspects of userspace
> security.)
>
> Regardless, yes, if we can make this work with CRIU (which I thought
> there seem to be consensus on), let's do it.
>
> > This seems like putting the onus on CRIU users to deal with a known-broken
> > thing? That seems really unreasonable? And people would just have to have
> > the right userland code to work in the kernel with mseal?
> >
> > Yeah I oppose entirely this unless I'm missing something?
>
> Hm, well, the primary goal is for Chrome OS and Android to use this. If
> there is honestly no path forward with CRIU, then hard Kconfig conflict
> it is. I'd much rather have it available for anyone who wants it, just
> like we do with lots of other features. Why force people who want this
> and not CRIU to build their own kernels? We have all kinds of boot params
> that if you set you get a broken system.
>
This patch is intended for ChromeOS and Android and is
feature-complete from their perspective.

To simplify v5, I propose removing kernel-cmd-line and avoiding the
complexities of CRIU/UML and gVisor. The KCONFIG is disabled by
default and will only apply to ARM and Intel architectures.

Later when a generic distribution wants to enable this feature, we can
work out a solution to handle those complexities.

Is this a reasonable path to move forward ?

Thanks
-Jeff

> -Kees
>
> --
> Kees Cook