Re: [RFC PATCH 4/4] x86/mm: write protect (most) page tables

From: Nadav Amit
Date: Tue Aug 24 2021 - 01:34:50 EST


Sorry for sending twice. The mail app decided to use HTML for some
reason.

On Aug 23, 2021, at 10:32 PM, Nadav Amit <nadav.amit@xxxxxxxxx> wrote:

>
> On Aug 23, 2021, at 6:25 AM, Mike Rapoport <rppt@xxxxxxxxxx> wrote:
>
> From: Mike Rapoport <rppt@xxxxxxxxxxxxx>
>
> Allocate page table using __GFP_PTE_MAPPED so that they will have 4K PTEs
> in the direct map. This allows to switch _PAGE_RW bit each time a page
> table page needs to be made writable or read-only.
>
> The writability of the page tables is toggled only in the lowest level page
> table modifiction functions and immediately switched off.
>
> The page tables created early in the boot (including the direct map page
> table) are not write protected.
>
>

[ snip ]

> +static void pgtable_write_set(void *pg_table, bool set)
> +{
> + int level = 0;
> + pte_t *pte;
> +
> + /*
> + * Skip the page tables allocated from pgt_buf break area and from
> + * memblock
> + */
> + if (!after_bootmem)
> + return;
> + if (!PageTable(virt_to_page(pg_table)))
> + return;
> +
> + pte = lookup_address((unsigned long)pg_table, &level);
> + if (!pte || level != PG_LEVEL_4K)
> + return;
> +
> + if (set) {
> + if (pte_write(*pte))
> + return;
> +
> + WRITE_ONCE(*pte, pte_mkwrite(*pte));

I think that the pte_write() test (and the following one) might hide
latent bugs. Either you know whether the PTE is write-protected or you
need to protect against nested/concurrent calls to pgtable_write_set()
by disabling preemption/IRQs.

Otherwise, you risk in having someone else write-protecting the PTE
after it is write-unprotected and before it is written - causing a crash,
or write-unprotecting it after it is protected - which circumvents the
protection.

Therefore, I would think that instead you should have:

VM_BUG_ON(pte_write(*pte)); // (or WARN_ON_ONCE())

In addition, if there are assumptions on the preemptability of the code,
it would be nice to have some assertions. I think that the code assumes
that all calls to pgtable_write_set() are done while holding the
page-table lock. If that is the case, perhaps adding some lockdep
assertion would also help to confirm the correctness.

[ I put aside the lack of TLB flushes, which make the whole matter of
delivered protection questionable. I presume that once PKS is used,
this is not an issue. ]