Re: [RFC PATCH v3 00/15] pkeys-based page table hardening
From: Kevin Brodsky
Date: Tue Mar 25 2025 - 13:11:43 EST
On 19/03/2025 22:54, Maxwell Bland wrote:
> [...]
>
> Onto the solution. In a pure-immutability approach, I ended up creating
> some definitions regarding what an "expected" virtual-to-physical
> mapping of kernel memory or change in page permissions is (my solution
> was to treat init_mm as a ground truth structure which is only RW under
> a more refined set of constraints on mapping updates and permission set
> updates), for example,
>
> if (get_tag(cur_pte & EL1_PADDR_MASK) == SECCOMP_PAGE) {
> // run BPF re-verifier to avoid the Qualys 2021 CVE PoC magic
>
> or forcing PXN for everything always unless some other check says we
> absolutely have to make this PX:
>
> if (!cur_pte && new_pte) {
> if (!(new_pte & pxn_bit)) {
> new_pte |= pxn_bit;
>
> And then a bunch of additional infrastructure to define what a "safe
> mapping of physical memory looks like", which introduces its own set of
> problems for defining how/where vmalloc allocates from, physical address
> layout randomization, ...
>
> Such an explication of the expected possible modifications to PTEs may
> be needed? I don't understand whether just adding guard() around set_pte
> helps this case.
If I understood correctly the case you're referring to, the simple
answer is that it doesn't. If some driver explicitly updates page
tables, and then some exploit hijacks those pgtable updates, then kpkeys
won't help at all. I see two separate problems here: ensuring that the
appropriate interface is used to write to pgtables (or other objects),
and then constraining which pgtables exactly a certain context can
manipulate via this interface. kpkeys addresses the first problem, which
you could say is a low-hanging fruit. The second problem clearly needs
addressing for a strong protection of page tables, but I don't think
kpkeys can directly help with that.
> [...]
>
>>> Where overwriting f_op is a "classic" bypass of protection systems like
>>> this one.
>> Indeed, this has been pointed out to me on a few occasions. On that note
>> I should reference the follow-up series that aims at protecting struct
>> cred [1].
>>
>> Using kpkeys to ensure that the task->cred pointer itself is mostly
>> read-only is not straightforward, because there is just too much
>> existing code that directly writes to other fields in task_struct -
>> meaning we can't simply make the whole task_struct mostly read-only.
>> What could be done is to split out the cred fields in a separate
>> kpkeys-protected page, and link that page from task_struct. There is a
>> clear overhead associated with this approach though, and maybe more
>> importantly we have just displaced the issue as the pointer to that
>> protected page remains writeable... I'm not sure how much a page-based
>> technique like pkeys can help here.
> __randomize_layout accomplished some of what is necessary here, while
> accommodating contemporary hardware limitations, albeit to a finer
> granularity and lesser degree. I am also not sure how well it works on
> production systems (yet! It is on my TODO list in the short term), or
> how much protection it really implies for an adversary with a "read
> gadget" that can determine where to write (either from associated asm or
> from the struct itself).
>
> But...
>
> (1) After thinking about this for a few hours, __randomize_layout implies
> something valuable: under circumstances where this is supported, because
> we assume the struct's layout itself is modifable, it becomes possible
> to introduce a canary value _not_ for the modifiable portions, but just
> for the non-modifiable portions, store this canary value in a strictly
> read-only region of memory that also tracks pointers to the structure,
> and ensure uses of the non-modifiable fields must always leverage the
> canary value stored in this secondary region.
I don't think this can be enforced with pkeys, since the overall struct
would have to remain RW. But I'm not sure I completely understand how
that scheme would work overall, or how __randomize_layout is a
prerequisite for it.
> [...]
>
> It may be possible to write a plugin modification to
>
> https://github.com/gcc-mirror/gcc/blob/12b2c414b6d0e0d1b3d328b58d654c19c30bee8c/gcc/config/arm/aarch-bti-insert.cc
>
> To also inject the POE/register set instructions conditioned upon the an
> expected privilege (kpkeys) level of the executing function.
>
> Instead of adding code to set the privilege state via a specialized
> callgate function, having this function's call explicitly tied into the
> GCC CFI instrumentation pass. Determination of the specific "key level",
> i.e. KPKEYS_LVL_PGTABLES, could likely be determined by the file
> name/path via the compiler, or many other clever mechanisms, potentially
> with support for exceptions via function attribute tags.
Right I'm with you now. This is actually something we considered, but
I'm not sure it really helps all that much. Adding a guard object to a
function is basically the same amount of churn as marking that function
with an attribute, except it doesn't require compiler support. It would
get more useful if you wanted all functions in a file to switch kpkeys
level (using a compiler flag), but for the purpose of making certain
data mostly read-only, I doubt you actually want that - the granularity
is just too coarse.
A different angle would be use an attribute to mark struct members,
rather than functions. That would instruct the compiler to switch kpkeys
level whenever that member is written (and only then). You would
probably want to forbid taking the address of the member too, as the
compiler can't easily track that. That said this doesn't solve the
bigger issue of existing code expecting to be able to write to other
members in the struct. It most likely works only if the kpkeys switch is
done when writing to any member of the struct (which may get very
expensive).
> [...]
> Noticed as well, just now, the reliance on the static_key for branch
> unlikely ... the following data structure (at the end of this email) may
> be of interest as an FYI ... it can be used to track whether the kernel
> self patching API is only being used to patch expected locations.
>
> I use this right now with additional checks that the instruction being
> written for the indirect branch matches the expected offset.
Are there exploits out there corrupting the address of a static branch?
This seems difficult to me in practice because the __jump_table section
where the addresses of instructions to be patched are stored is marked
__ro_after_init.
- Kevin