Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

From: Will Deacon
Date: Tue Jan 05 2021 - 17:17:32 EST


On Tue, Jan 05, 2021 at 09:22:51PM +0000, Nadav Amit wrote:
> > On Jan 5, 2021, at 12:39 PM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
> >
> > On Tue, Jan 05, 2021 at 07:26:43PM +0000, Nadav Amit wrote:
> >>> On Jan 5, 2021, at 10:20 AM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
> >>>
> >>> On Fri, Dec 25, 2020 at 01:25:29AM -0800, Nadav Amit wrote:
> >>>> Fixes: 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes tracking")
> >>>
> >>> Targeting a backport down to 2013 when nothing could wrong in practice
> >>> with page_mapcount sounds backwards and unnecessarily risky.
> >>>
> >>> In theory it was already broken and in theory
> >>> 09854ba94c6aad7886996bfbee2530b3d8a7f4f4 is absolutely perfect and the
> >>> previous code of 2013 is completely wrong, but in practice the code
> >>> from 2013 worked perfectly until Aug 21 2020.
> >>
> >> Well… If you consider the bug that Will recently fixed [1], then soft-dirty
> >> was broken (for a different, yet related reason) since 0758cd830494
> >> ("asm-generic/tlb: avoid potential double flush”).
> >>
> >> This is not to say that I argue that the patch should be backported to 2013,
> >> just to say that memory corruption bugs can be unnoticed.
> >>
> >> [1] https://patchwork.kernel.org/project/linux-mm/patch/20201210121110.10094-2-will@xxxxxxxxxx/
> >
> > Is this a fix or a cleanup?
> >
> > The above is precisely what I said earlier that tlb_gather had no
> > reason to stay in clear_refs and it had to use inc_tlb_flush_pending
> > as mprotect, but it's not a fix? Is it? I suggested it as a pure
> > cleanup. So again no backport required. The commit says fix this but
> > it means "clean this up".
>
> It is actually a fix. I think the commit log is not entirely correct and
> should include:
>
> Fixes: 0758cd830494 ("asm-generic/tlb: avoid potential double flush”).
>
> Since 0758cd830494, calling tlb_finish_mmu() without any previous call to
> pte_free_tlb() and friends does not flush the TLB. The soft-dirty bug
> producer that I sent fails without this patch of Will.

Yes, it's a fix, but I didn't rush it for 5.10 because I don't think rushing
this sort of thing does anybody any favours. I agree that the commit log
should be updated; I mentioned this report in the cover letter:

https://lore.kernel.org/linux-mm/CA+32v5zzFYJQ7eHfJP-2OHeR+6p5PZsX=RDJNU6vGF3hLO+j-g@xxxxxxxxxxxxxx/

demonstrating that somebody has independently stumbled over the missing TLB
invalidation in userspace, but it's not as bad as the other issues we've been
discussing in this thread.

Will