Re: [PATCH mm-unstable] mm/khugepaged: fix collapse_pte_mapped_thp() versus uffd

From: David Hildenbrand
Date: Tue Aug 22 2023 - 11:34:40 EST


On 22.08.23 17:30, Jann Horn wrote:
On Tue, Aug 22, 2023 at 5:23 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
On Tue, Aug 22, 2023 at 04:39:43PM +0200, Jann Horn wrote:
Perhaps something else will want that same behaviour in future (it's
tempting, but difficult to guarantee correctness); for now, it is just
userfaultfd (but by saying "_armed" rather than "_missing", I'm half-
expecting uffd to add more such exceptional modes in future).

Hm, yeah, sounds okay. (I guess we'd also run into this if we ever
wanted to make it possible to reliably install PTE markers with
madvise() or something like that, which might be nice for allowing
userspace to create guard pages without unnecessary extra VMAs...)

I don't know what a userspace API for this would look like, but I have
a dream of creating guard VMAs which only live in the maple tree and
don't require the allocation of a struct VMA. Use some magic reserved
pointer value like XA_ZERO_ENTRY to represent them ... seems more
robust than putting a PTE marker in the page tables?

Chrome currently uses a lot of VMAs for its heap, which I think are
basically alternating PROT_NONE and PROT_READ|PROT_WRITE anonymous
VMAs. Like this:

[...]
3a10002cf000-3a10002d0000 ---p 00000000 00:00 0
3a10002d0000-3a10002e6000 rw-p 00000000 00:00 0
3a10002e6000-3a10002e8000 ---p 00000000 00:00 0
3a10002e8000-3a10002f2000 rw-p 00000000 00:00 0
3a10002f2000-3a10002f4000 ---p 00000000 00:00 0
3a10002f4000-3a10002fb000 rw-p 00000000 00:00 0
3a10002fb000-3a10002fc000 ---p 00000000 00:00 0
3a10002fc000-3a1000303000 rw-p 00000000 00:00 0
3a1000303000-3a1000304000 ---p 00000000 00:00 0
3a1000304000-3a100031b000 rw-p 00000000 00:00 0
3a100031b000-3a100031c000 ---p 00000000 00:00 0
3a100031c000-3a1000326000 rw-p 00000000 00:00 0
3a1000326000-3a1000328000 ---p 00000000 00:00 0
3a1000328000-3a100033a000 rw-p 00000000 00:00 0
3a100033a000-3a100033c000 ---p 00000000 00:00 0
3a100033c000-3a100038b000 rw-p 00000000 00:00 0
3a100038b000-3a100038c000 ---p 00000000 00:00 0
3a100038c000-3a100039b000 rw-p 00000000 00:00 0
3a100039b000-3a100039c000 ---p 00000000 00:00 0
3a100039c000-3a10003af000 rw-p 00000000 00:00 0
3a10003af000-3a10003b0000 ---p 00000000 00:00 0
3a10003b0000-3a10003e8000 rw-p 00000000 00:00 0
3a10003e8000-3a1000401000 ---p 00000000 00:00 0
3a1000401000-3a1000402000 rw-p 00000000 00:00 0
3a1000402000-3a100040c000 ---p 00000000 00:00 0
3a100040c000-3a100046f000 rw-p 00000000 00:00 0
3a100046f000-3a1000470000 ---p 00000000 00:00 0
3a1000470000-3a100047a000 rw-p 00000000 00:00 0
3a100047a000-3a100047c000 ---p 00000000 00:00 0
3a100047c000-3a1000492000 rw-p 00000000 00:00 0
3a1000492000-3a1000494000 ---p 00000000 00:00 0
3a1000494000-3a10004a2000 rw-p 00000000 00:00 0
3a10004a2000-3a10004a4000 ---p 00000000 00:00 0
3a10004a4000-3a10004b6000 rw-p 00000000 00:00 0
3a10004b6000-3a10004b8000 ---p 00000000 00:00 0
3a10004b8000-3a10004ea000 rw-p 00000000 00:00 0
3a10004ea000-3a10004ec000 ---p 00000000 00:00 0
3a10004ec000-3a10005f4000 rw-p 00000000 00:00 0
3a10005f4000-3a1000601000 ---p 00000000 00:00 0
3a1000601000-3a1000602000 rw-p 00000000 00:00 0
3a1000602000-3a1000604000 ---p 00000000 00:00 0
3a1000604000-3a100062b000 rw-p 00000000 00:00 0
3a100062b000-3a1000801000 ---p 00000000 00:00 0
[...]

I was thinking if you used PTE markers as guards, you could maybe turn
all that into more or less a single VMA?

I proposed the topic "A proper API for sparse memory mappings" for the bi-weekly MM meeting on September 20, that would also cover exactly that use case. :)

--
Cheers,

David / dhildenb