Re: [RFC PATCH 00/16] pkeys-based page table hardening

From: Jann Horn
Date: Fri Dec 06 2024 - 14:15:42 EST

Next message: Linus Torvalds: "Re: [PATCH 02/10] compiler.h: add is_const() as a replacement of __is_constexpr()"
Previous message: tip-bot2 for Kirill A. Shutemov: "[tip: x86/mm] x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()"
In reply to: Kevin Brodsky: "[RFC PATCH 16/16] mm: Add basic tests for kpkeys_hardened_pgtables"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Dec 6, 2024 at 11:13 AM Kevin Brodsky <kevin.brodsky@xxxxxxx> wrote:
> This is a proposal to leverage protection keys (pkeys) to harden
> critical kernel data, by making it mostly read-only. The series includes
> a simple framework called "kpkeys" to manipulate pkeys for in-kernel use,
> as well as a page table hardening feature based on that framework
> (kpkeys_hardened_pgtables). Both are implemented on arm64 as a proof of
> concept, but they are designed to be compatible with any architecture
> implementing pkeys.
>
> The proposed approach is a typical use of pkeys: the data to protect is
> mapped with a given pkey P, and the pkey register is initially configured
> to grant read-only access to P. Where the protected data needs to be
> written to, the pkey register is temporarily switched to grant write
> access to P on the current CPU.
>
> The key fact this approach relies on is that the target data is
> only written to via a limited and well-defined API. This makes it
> possible to explicitly switch the pkey register where needed, without
> introducing excessively invasive changes, and only for a small amount of
> trusted code.
>
> Page tables were chosen as they are a popular (and critical) target for
> attacks, but there are of course many others - this is only a starting
> point (see section "Further use-cases"). It has become more and more
> common for accesses to such target data to be mediated by a hypervisor
> in vendor kernels; the hope is that kpkeys can provide much of that
> protection in a simpler manner. No benchmarking has been performed at
> this stage, but the runtime overhead should also be lower (though likely
> not negligible).

Yeah, it isn't great that vendor kernels contain such invasive changes...

I guess one difference between this approach and a hypervisor-based
approach is that a hypervisor that uses a second layer of page tables
can also prevent access through aliasing mappings, while pkeys only
prevent access through a specific mapping? (Like if an attacker
managed to add a page that is mapped into userspace to a page
allocator freelist, allocate this page as a page table, and use the
userspace mapping to write into this page table. But I guess whether
that is an issue depends on the threat model.)

> # kpkeys_hardened_pgtables
>
> The kpkeys_hardened_pgtables feature uses the interface above to make
> the (kernel and user) page tables read-only by default, enabling write
> access only in helpers such as set_pte(). One complication is that those
> helpers as well as page table allocators are used very early, before
> kpkeys become available. Enabling kpkeys_hardened_pgtables, if and when
> kpkeys become available, is therefore done as follows:
>
> 1. A static key is turned on. This enables a transition to
> KPKEYS_LVL_PGTABLES in all helpers writing to page tables, and also
> impacts page table allocators (see step 3).
>
> 2. All pages holding kernel page tables are set to KPKEYS_PKEY_PGTABLES.
> This ensures they can only be written when runnning at
> KPKEYS_LVL_PGTABLES.
>
> 3. Page table allocators set the returned pages to KPKEYS_PKEY_PGTABLES
> (and the pkey is reset upon freeing). This ensures that all page
> tables are mapped with that privileged pkey.
>
> # Threat model
>
> The proposed scheme aims at mitigating data-only attacks (e.g.
> use-after-free/cross-cache attacks). In other words, it is assumed that
> control flow is not corrupted, and that the attacker does not achieve
> arbitrary code execution. Nothing prevents the pkey register from being
> set to its most permissive state - the assumption is that the register
> is only modified on legitimate code paths.

Is the threat model that the attacker has already achieved full
read/write access to unprotected kernel data and should be stopped
from gaining write access to protected data? Or is the threat model
that the attacker has achieved some limited corruption, and this
series is intended to make it harder to either gain write access to
protected data or achieve full read/write access to unprotected data?

Next message: Linus Torvalds: "Re: [PATCH 02/10] compiler.h: add is_const() as a replacement of __is_constexpr()"
Previous message: tip-bot2 for Kirill A. Shutemov: "[tip: x86/mm] x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()"
In reply to: Kevin Brodsky: "[RFC PATCH 16/16] mm: Add basic tests for kpkeys_hardened_pgtables"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]