Re: [RFC PATCH 4/4] x86/mm: write protect (most) page tables

From: Dave Hansen
Date: Wed Aug 25 2021 - 10:59:56 EST


On 8/23/21 8:34 PM, Andy Lutomirski wrote:
>> I would expected this to have leveraged the pte_offset_map/unmap() code
>> to enable/disable write access. Granted, it would enable write access
>> even when only a read is needed, but that could be trivially fixed with
>> having a variant like:
>>
>> pte_offset_map_write()
>> pte_offset_unmap_write()
> I would also like to see a discussion of how races in which multiple
> threads or CPUs access ptes in the same page at the same time.

Yeah, the 32-bit highmem code has a per-cpu mapping area for
kmap_atomic() that lies underneath the pte_offset_map(). Although it's
in the shared kernel mapping, only one CPU uses it.

I didn't look at it until you mentioned it, but the code in this set is
just plain buggy if two CPUs do a
enable_pgtable_write()/disable_pgtable_write(). They'll clobber each other:

> +static void pgtable_write_set(void *pg_table, bool set)
> +{
> + int level = 0;
> + pte_t *pte;
> +
> + /*
> + * Skip the page tables allocated from pgt_buf break area and from
> + * memblock
> + */
> + if (!after_bootmem)
> + return;
> + if (!PageTable(virt_to_page(pg_table)))
> + return;
> +
> + pte = lookup_address((unsigned long)pg_table, &level);
> + if (!pte || level != PG_LEVEL_4K)
> + return;
> +
> + if (set) {
> + if (pte_write(*pte))
> + return;
> +
> + WRITE_ONCE(*pte, pte_mkwrite(*pte));
> + } else {
> + if (!pte_write(*pte))
> + return;
> +
> + WRITE_ONCE(*pte, pte_wrprotect(*pte));
> + }
> +}