Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

From: Nadav Amit
Date: Tue Jan 05 2021 - 14:27:51 EST


> On Jan 5, 2021, at 10:20 AM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
>
> On Fri, Dec 25, 2020 at 01:25:29AM -0800, Nadav Amit wrote:
>> Fixes: 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes tracking")
>
> Targeting a backport down to 2013 when nothing could wrong in practice
> with page_mapcount sounds backwards and unnecessarily risky.
>
> In theory it was already broken and in theory
> 09854ba94c6aad7886996bfbee2530b3d8a7f4f4 is absolutely perfect and the
> previous code of 2013 is completely wrong, but in practice the code
> from 2013 worked perfectly until Aug 21 2020.

Well… If you consider the bug that Will recently fixed [1], then soft-dirty
was broken (for a different, yet related reason) since 0758cd830494
("asm-generic/tlb: avoid potential double flush”).

This is not to say that I argue that the patch should be backported to 2013,
just to say that memory corruption bugs can be unnoticed.

[1] https://patchwork.kernel.org/project/linux-mm/patch/20201210121110.10094-2-will@xxxxxxxxxx/

>
> Since nothing at all could go wrong in soft dirty and uffd-wp until
> 09854ba94c6aad7886996bfbee2530b3d8a7f4f4, the Fixes need to target
> that, definitely not a patch from 2013.
>
> This means the backports will apply clean, they don't need a simple
> solution but one that doesn't regress the performance of open source
> virtual machines and open source products using clear_refs and uffd-wp
> in general.

To summarize my action items based your (and others) feedback on both
patches:

1. I will break the first patch into two different patches, one with the
“optimization” for write-unprotect, based on your feedback. It will not
be backported.

2. I will try to add a patch to avoid TLB flushes on
userfaultfd-writeunprotect. It will also not be backported.

3. Let me know if you want me to use your version of testing
mm_tlb_flush_pending() and conditionally flushing, wait for new version fro
you or Peter or to go with taking mmap_lock for write.

Thanks again,
Nadav