Re: [RFC PATCH v1 0/1] seal system mappings
From: Liam R. Howlett
Date: Mon Oct 07 2024 - 22:20:39 EST
* jeffxu@xxxxxxxxxxxx <jeffxu@xxxxxxxxxxxx> [241004 12:32]:
> From: Jeff Xu <jeffxu@xxxxxxxxxx>
>
> Seal vdso, vvar, sigpage, uprobes and vsyscall.
>
> Those mappings are readonly or executable only, sealing can protect
> them from ever changing during the life time of the process.
>
> System mappings such as vdso, vvar, and sigpage (for arm) are
> generated by the kernel during program initialization. These mappings
> are designated as non-writable, and sealing them will prevent them
> from ever becoming writeable.
But it also means they cannot be unmapped, right?
I'm not saying it's a thing people should, but recent conversations
with the ppc people seem to indicate that people do 'things' to the vdso
such as removing it.
Won't this change mean they cannot do that, at least if mseal is enabled
on ppc64? In which case we would have a different special mapping for
powerpc, or any other platform that wants to be able to unmap the vdso
(or vvar or whatever else?)
In fact, I came across people removing the vdso to catch callers to
those functions which they didn't want to allow. In this case enabling
the security of mseal would not allow them to stop applications from
vdso calls. Again, I'm not saying this is a good (or bad) idea but it
happening.
>
> Unlike the aforementioned mappings, the uprobe mapping is not
> established during program startup. However, its lifetime is the same
> as the process's lifetime [1], thus sealable.
>
> The vdso, vvar, sigpage, and uprobe mappings all invoke the
> _install_special_mapping() function. As no other mappings utilize this
> function, it is logical to incorporate sealing logic within
> _install_special_mapping(). This approach avoids the necessity of
> modifying code across various architecture-specific implementations.
>
> The vsyscall mapping, which has its own initialization function, is
> sealed in the XONLY case, it seems to be the most common and secure
> case of using vsyscall.
>
> It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> alter the mapping of vdso, vvar, and sigpage during restore
> operations. Consequently, this feature cannot be universally enabled
> across all systems. To address this, a kernel configuration option has
> been introduced to enable or disable this functionality. I tested
> CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use
> CHECKPOINT_RESTORE, to verify the sealing works.
I am hesitant to say that CRIU is the only user of moving the vdso, as
the ppc people wanted the ability for the fallback methods to still
function when the vdso was unmapped.
I am not sure we can change the user expected behaviour based on a
configuration option; users may be able to mmap/munmap but may not be
able to boot their own kernel, but maybe it's okay?
>
> [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@xxxxxxxxxxxxxx/
>
> Jeff Xu (1):
> exec: seal system mappings
>
> .../admin-guide/kernel-parameters.txt | 9 ++++
> arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++-
> fs/exec.c | 53 +++++++++++++++++++
> include/linux/fs.h | 1 +
> mm/mmap.c | 1 +
> security/Kconfig | 26 +++++++++
> 6 files changed, 97 insertions(+), 2 deletions(-)
>
> --
> 2.47.0.rc0.187.ge670bccf7e-goog
>