Re: [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap()
From: David Hildenbrand
Date: Mon Jan 29 2024 - 05:07:13 EST
On 29.01.24 04:25, Chris Li wrote:
Hi David and Barry,
On Mon, Jan 22, 2024 at 10:49 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
I have on my todo list to move all that !anon handling out of
folio_add_anon_rmap_ptes(), and instead make swapin code call add
folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag
then (-> whole new folio exclusive).
That's the cleaner approach.
one tricky thing is that sometimes it is hard to know who is the first
one to add rmap and thus should
call folio_add_new_anon_rmap.
especially when we want to support swapin_readahead(), the one who
allocated large filio might not
be that one who firstly does rmap.
I think Barry has a point. Two tasks might race to swap in the folio
then race to perform the rmap.
folio_add_new_anon_rmap() should only call a folio that is absolutely
"new", not shared. The sharing in swap cache disqualifies that
condition.
We have to hold the folio lock. So only one task at a time might do the
folio_add_anon_rmap_ptes() right now, and the
folio_add_new_shared_anon_rmap() in the future [below].
Also observe how folio_add_anon_rmap_ptes() states that one must hold
the page lock, because otherwise this would all be completely racy.
From the pte swp exclusive flags, we know for sure whether we are
dealing with exclusive vs. shared. I think patch #6 does not properly
check that all entries are actually the same in that regard (all
exclusive vs all shared). That likely needs fixing.
[I have converting per-page PageAnonExclusive flags to a single
per-folio flag on my todo list. I suspect that we'll keep the
per-swp-pte exlusive bits, but the question is rather what we can
actually make work, because swap and migration just make it much more
complicated. Anyhow, future work]
is it an acceptable way to do the below in do_swap_page?
if (!folio_test_anon(folio))
folio_add_new_anon_rmap()
else
folio_add_anon_rmap_ptes()
I am curious to know the answer as well.
Yes, the end code should likely be something like:
/* ksm created a completely new copy */
if (unlikely(folio != swapcache && swapcache)) {
folio_add_new_anon_rmap(folio, vma, vmf->address);
folio_add_lru_vma(folio, vma);
} else if (folio_test_anon(folio)) {
folio_add_anon_rmap_ptes(rmap_flags)
} else {
folio_add_new_anon_rmap(rmap_flags)
}
Maybe we want to avoid teaching all existing folio_add_new_anon_rmap()
callers about a new flag, and just have a new
folio_add_new_shared_anon_rmap() instead. TBD.
BTW, that test might have a race as well. By the time the task got
!anon result, this result might get changed by another task. We need
to make sure in the caller context this race can't happen. Otherwise
we can't do the above safely.
Again, folio lock. Observe the folio_lock_or_retry() call that covers
our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls.
--
Cheers,
David / dhildenb