Re: [syzbot] [mm?] kernel BUG in vma_replace_policy

From: Hugh Dickins
Date: Fri Sep 15 2023 - 23:17:56 EST


On Fri, 15 Sep 2023, Matthew Wilcox wrote:
> On Thu, Sep 14, 2023 at 09:26:15PM -0700, Hugh Dickins wrote:
> > On Thu, 14 Sep 2023, Suren Baghdasaryan wrote:
> > > Yes, I just finished running the reproducer on both upstream and
> > > linux-next builds listed in
> > > https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023 and the
> > > problem does not happen anymore.
> > > I'm fine with your suggestion too, just wanted to point out it would
> > > introduce change in the behavior. Let me know how you want to proceed.
> >
> > Well done, identifying the mysterious cause of this problem:
> > I'm glad to hear that you've now verified that hypothesis.
> >
> > You're right, it would be a regression to follow Matthew's suggestion.
> >
> > Traditionally, modulo bugs and inconsistencies, the queue_pages_range()
> > phase of do_mbind() has done the best it can, gathering all the pages it
> > can that need migration, even if some were missed; and proceeds to do the
> > mbind_range() phase if there was nothing "seriously" wrong (a gap causing
> > -EFAULT). Then at the end, if MPOL_MF_STRICT was set, and not all the
> > pages could be migrated (or MOVE was not specified and not all pages
> > were well placed), it returns -EIO rather than 0 to inform the caller
> > that not all could be done.
> >
> > There have been numerous tweaks, but I think most importantly
> > 5.3's d883544515aa ("mm: mempolicy: make the behavior consistent when
> > MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") added those "return 1"s
> > which stop the pagewalk early. In my opinion, not an improvement - makes
> > it harder to get mbind() to do the best job it can (or is it justified as
> > what you're asking for if you say STRICT?).
>
> I suspect you agree that it's inconsistent to stop early. Userspace
> doesn't know at which point we found an unmovable page, so it can't behave
> rationally. Perhaps we should remove the 'early stop' and attempt to
> migrate every page in the range, whether it's before or after the first
> unmovable page?

Yes, that's what I was arguing for, and how it was done in olden days.
Though (after Yang Shi's following comments, and looking back at my
last attempted patch here) I may disagree with myself about the right
behavior in the MPOL_MF_STRICT case.

Hugh