Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

From: Nadav Amit
Date: Sat Oct 29 2022 - 14:05:23 EST


On Oct 28, 2022, at 5:42 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> I think the proper fix (or at least _a_ proper fix) would be to
> actually carry the dirty bit along to the __tlb_remove_page() point,
> and actually treat it exactly the same way as the page pointer itself
> - set the page dirty after the TLB flush, the same way we can free the
> page after the TLB flush.
>
> We could easiy hide said dirty bit in the low bits of the
> "batch->pages[]" array or something like that. We'd just have to add
> the 'dirty' argument to __tlb_remove_page_size() and friends.

Thank you for your quick response. I was slow to respond due to a jet lag.

Anyhow, I am not sure whether the solution that you propose would work.
Please let me know if my understanding makes sense.

Let’s assume that we do not call set_page_dirty() before we remove the rmap
but only after we invalidate the page [*]. Let’s assume that
shrink_page_list() is called after the page’s rmap is removed and the page
is no longer mapped, but before set_page_dirty() was actually called.

In such a case, shrink_page_list() would consider the page clean, and would
indeed keep the page (since __remove_mapping() would find elevated page
refcount), which appears to give us a chance to mark the page as dirty
later.

However, IIUC, in this case shrink_page_list() might still call
filemap_release_folio() and release the buffers, so calling set_page_dirty()
afterwards - after the actual TLB invalidation took place - would fail.

> Your idea of "do the page_remove_rmap() late instead" would also work,
> but the reason I think just squirrelling away the dirty bit is the
> "proper" fix is that it would get rid of the whole need for
> 'force_flush' in this area entirely. So we'd not only fix that race
> you noticed, we'd actually do so and reduce the number of TLB flushes
> too.

I’m all for reducing the number of TLB flushes, and your solution does sound
better in general. I proposed something that I considered having the path of
least resistance (i.e., least chance of breaking something). I can do what
you propsosed, but I am not sure how to deal with the buffers being removed.

One more note: This issue, I think, also affects migrate_vma_collect_pmd().
Alistair recently addressed an issue there, but in my prior feedback to him
I missed this issue.


[*] Note that this would be for our scenario pretty much the same if we also
called set_page_dirty() before removing the rmap, but the page was cleaned
while the TLB invalidation has still not been performed.