Re: [PATCH 1/1] mm: lock VMAs skipped by a failed queue_pages_range()
From: Yang Shi
Date: Tue Sep 19 2023 - 17:09:58 EST
On Tue, Sep 19, 2023 at 1:53 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Mon 18-09-23 14:16:08, Suren Baghdasaryan wrote:
> > When queue_pages_range() encounters an unmovable page, it terminates
> > its page walk. This walk, among other things, locks the VMAs in the range.
> > This termination might result in some VMAs being left unlock after
> > queue_pages_range() completes. Since do_mbind() continues to operate on
> > these VMAs despite the failure from queue_pages_range(), it will encounter
> > an unlocked VMA.
> > This mbind() behavior has been modified several times before and might
> > need some changes to either finish the page walk even in the presence
> > of unmovable pages or to error out immediately after the failure to
> > queue_pages_range(). However that requires more discussions, so to
> > fix the immediate issue, explicitly lock the VMAs in the range if
> > queue_pages_range() failed. The added condition does not save much
> > but is added for documentation purposes to understand when this extra
> > locking is needed.
>
> The semantic of the walk in this case is really clear as mud. I was
> trying to reconstruct the whole picture and it really hurts... Then I
> found http://lkml.kernel.org/r/CAHbLzkrmTaqBRmHVdE2kyW57Uoghqd_E+jAXC9cB5ofkhL-uvw@xxxxxxxxxxxxxx
> and that helped a lot. Let's keep it a reference at least in the email
> thread here for future.
FYI, I'm working on a fix for the regression mentioned in that series,
and Hugh has some clean up and enhancement for that too.
>
> > Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> > Reported-by: syzbot+b591856e0f0139f83023@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@xxxxxxxxxx/
> > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
>
> I cannot say I like the patch (it looks like a potential double locking
> unless you realize this lock is special) but considering this might be just
> temporal I do not mind.
>
> Acked-by: Michal Hocko <mhocko@xxxxxxxx>
>
> Thanks!
>
> > ---
> > mm/mempolicy.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 42b5567e3773..cbc584e9b6ca 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> > vma_iter_init(&vmi, mm, start);
> > prev = vma_prev(&vmi);
> > for_each_vma_range(vmi, vma, end) {
> > + /* If queue_pages_range failed then not all VMAs might be locked */
> > + if (ret)
> > + vma_start_write(vma);
> > err = mbind_range(&vmi, vma, &prev, start, end, new);
> > if (err)
> > break;
> > --
> > 2.42.0.459.ge4e396fd5e-goog
>
> --
> Michal Hocko
> SUSE Labs