Re: [syzbot] [mm?] kernel BUG in vma_replace_policy

From: Matthew Wilcox
Date: Thu Sep 14 2023 - 15:09:11 EST


On Thu, Sep 14, 2023 at 06:20:56PM +0000, Suren Baghdasaryan wrote:
> I think I found the problem and the explanation is much simpler. While
> walking the page range, queue_folios_pte_range() encounters an
> unmovable page and queue_folios_pte_range() returns 1. That causes a
> break from the loop inside walk_page_range() and no more VMAs get
> locked. After that the loop calling mbind_range() walks over all VMAs,
> even the ones which were skipped by queue_folios_pte_range() and that
> causes this BUG assertion.
>
> Thinking what's the right way to handle this situation (what's the
> expected behavior here)...
> I think the safest way would be to modify walk_page_range() and make
> it continue calling process_vma_walk_lock() for all VMAs in the range
> even when __walk_page_range() returns a positive err. Any objection or
> alternative suggestions?

So we only return 1 here if MPOL_MF_MOVE* & MPOL_MF_STRICT were
specified. That means we're going to return an error, no matter what,
and there's no point in calling mbind_range(). Right?

+++ b/mm/mempolicy.c
@@ -1334,6 +1334,8 @@ static long do_mbind(unsigned long start, unsigned long len,
ret = queue_pages_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist, true);

+ if (ret == 1)
+ ret = -EIO;
if (ret < 0) {
err = ret;
goto up_out;

(I don't really understand this code, so it can't be this simple, can
it? Why don't we just return -EIO from queue_folios_pte_range() if
this is the right answer?)