[RFC PATCH v2 0/1] seal system mappings

From: jeffxu
Date: Mon Oct 14 2024 - 17:50:41 EST


From: Jeff Xu <jeffxu@xxxxxxxxxxxx>

Seal vdso, vvar, sigpage, uprobes and vsyscall.

Those mappings are readonly or executable only, sealing can protect
them from ever changing during the life time of the process. For
complete descriptions of memory sealing, please see mseal.rst [1].

System mappings such as vdso, vvar, and sigpage (for arm) are
generated by the kernel during program initialization. These mappings
are designated as non-writable, and sealing them will prevent them
from ever becoming writeable.

Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [2], thus sealable.

The vdso, vvar, sigpage, and uprobe mappings all invoke the
_install_special_mapping() function. As no other mappings utilize this
function, it is logical to incorporate sealing logic within
_install_special_mapping(). This approach avoids the necessity of
modifying code across various architecture-specific implementations.

The vsyscall mapping, which has its own initialization function, is
sealed in the XONLY case, it seems to be the most common and secure
case of using vsyscall.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the mapping of vdso, vvar, and sigpage during restore
operations. Consequently, this feature cannot be universally enabled
across all systems. To address this, a kernel configuration option has
been introduced to enable or disable this functionality. Note, uprobe
is always sealed and not controlled by this kernel configuration.

I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS,
which doesn’t use CHECKPOINT_RESTORE.

[1] Documentation/userspace-api/mseal.rst
[2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@xxxxxxxxxxxxxx/

History:
V2:
Seal uprobe always (Oleg Nesterov)
Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
Rebase to linux_main

V1:
https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@xxxxxxxxxx/

Jeff Xu (1):
exec: seal system mappings

.../admin-guide/kernel-parameters.txt | 10 ++++
arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++-
fs/exec.c | 53 +++++++++++++++++++
include/linux/fs.h | 1 +
kernel/events/uprobes.c | 2 +-
mm/mmap.c | 1 +
security/Kconfig | 26 +++++++++
7 files changed, 99 insertions(+), 3 deletions(-)

--
2.47.0.rc1.288.g06298d1525-goog