Re: [PATCH v1 10/15] mm/page-flags: reuse PG_slab as PG_anon_exclusive for PageAnon() pages

From: David Hildenbrand
Date: Fri Mar 11 2022 - 13:47:06 EST


On 08.03.22 15:14, David Hildenbrand wrote:
> The basic question we would like to have a reliable and efficient answer
> to is: is this anonymous page exclusive to a single process or might it
> be shared?
>
> In an ideal world, we'd have a spare pageflag. Unfortunately, pageflags
> don't grow on trees, so we have to get a little creative for the time
> being.
>
> Introduce a way to mark an anonymous page as exclusive, with the
> ultimate goal of teaching our COW logic to not do "wrong COWs", whereby
> GUP pins lose consistency with the pages mapped into the page table,
> resulting in reported memory corruptions.
>
> Most pageflags already have semantics for anonymous pages, so we're left
> with reusing PG_slab for our purpose: for PageAnon() pages PG_slab now
> translates to PG_anon_exclusive, teach some in-kernel code that manually
> handles PG_slab about that.
>
> Add a spoiler on the semantics of PG_anon_exclusive as documentation. More
> documentation will be contained in the code that actually makes use of
> PG_anon_exclusive.
>
> We won't be clearing PG_anon_exclusive on destructive unmapping (i.e.,
> zapping) of page table entries, page freeing code will handle that when
> also invalidate page->mapping to not indicate PageAnon() anymore.
> Letting information about exclusivity stick around will be an important
> property when adding sanity checks to unpinning code.
>
> RFC notes: in-tree tools/cgroup/memcg_slabinfo.py looks like it might need
> some care. We'd have to lookup the head page and check if
> PageAnon() is set. Similarly, tools living outside the kernel
> repository like crash and makedumpfile might need adaptions.
>
> Cc: Roman Gushchin <guro@xxxxxx>
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
> ---

I'm currently testing with the following. My tests so far with a swapfile on
all different kinds of weird filesystems (excluding networking fs, though)
revealed no surprises so far: