Re: [PATCH] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file

From: Zach O'Keefe
Date: Wed Mar 29 2023 - 20:58:44 EST


On Wed, Mar 29, 2023 at 5:14 PM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Wed, Mar 29, 2023 at 2:53 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, 29 Mar 2023 18:53:30 +0400 Ivan Orlov <ivan.orlov0322@xxxxxxxxx> wrote:
> >
> > > Syzkaller reported the following issue:
> > >
> > > ...
> > >
> > > The 'xas_store' call during page cache scanning can potentially
> > > translate 'xas' into the error state (with the reproducer provided
> > > by the syzkaller the error code is -ENOMEM). However, there are no
> > > further checks after the 'xas_store', and the next call of 'xas_next'
> > > at the start of the scanning cycle doesn't increase the xa_index,
> > > and the issue occurs.
> > >
> > > This patch will add the xarray state error checking after the
> > > 'xas_store' and the corresponding result error code.
> > >
> > > Tested via syzbot.
> > >
> > > Reported-by: syzbot+9578faa5475acb35fa50@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
> > > Signed-off-by: Ivan Orlov <ivan.orlov0322@xxxxxxxxx>
> > > ---
> > > mm/khugepaged.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 92e6f56a932d..4d9850d9ea7f 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -55,6 +55,7 @@ enum scan_result {
> > > SCAN_CGROUP_CHARGE_FAIL,
> > > SCAN_TRUNCATED,
> > > SCAN_PAGE_HAS_PRIVATE,
> > > + SCAN_STORE_FAILED,
> > > };
> > >
> > > #define CREATE_TRACE_POINTS
> > > @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> > > goto xa_locked;
> > > }
> > > xas_store(&xas, hpage);
> > > + if (xas_error(&xas)) {
> > > + /* revert shmem_charge performed
> > > + * in the previous condition
> > > + */
> > > + mapping->nrpages--;
> > > + shmem_uncharge(mapping->host, 1);
> > > + result = SCAN_STORE_FAILED;
> > > + goto xa_locked;
> > > + }
> > > nr_none++;
> > > continue;
> > > }
> >
> > Needs this, I assume.
> >
> > --- a/include/trace/events/huge_memory.h~mm-khugepaged-fix-kernel-bug-in-hpage_collapse_scan_file-fix
> > +++ a/include/trace/events/huge_memory.h
> > @@ -36,7 +36,8 @@
> > EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \
> > EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \
> > EM( SCAN_TRUNCATED, "truncated") \
> > - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \
> > + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \
> > + EMe(SCAN_STORE_FAILED, "store_failed")
>
> I'm a little bit reluctant to make the error code list longer, can we
> just return SCAN_FAIL? IIUC this issue should happen very rarely,
> maybe not worth a new error code.
>
> Basically the rollback approach makes sense to me. IIRC Zach was
> looking into the same problem, loop him in. He may share some
> thoughts.

Thanks Yang, appreciate being brought into the loop. One of the things
I plan to do during paternity leave is update my email filters so I
don't miss things like this.

Coincidentally, Hugh also just brought this to my attention. Looks to
be the syzkaller report posted a few weeks ago[1].

Given there are two series munging with this path right now (or were),
I was trying to find time to first review said series, then post a fix
on top, if necessary (or it could have been incorporated into David
Stevens' "mm/khugepaged: fix khugepaged+shmem races" series).

But, I'm perennially behind and haven't been able to find time to do
those reviews, and so my "fix" attempt has sat. Thanks, Ivan, for
picking up the slack.

So, I did test this patch with the syzbot reproducer, and everything
looked good :) Thank you.

I have similar reservations about increasing the error code list
longer, unless there is opportunity to combine other failure sites
under a common umbrella. For example, I was debating if a SCAN_OOM
error was worthy of inclusion, which we could use in
__collapse_huge_page_swapin() on VM_FAULT_OOM. I personally went the
route of saying, "no, just use SCAN_FAIL".

There also ought to be some comments, somewhere (either in code, or
commit description) about why this is the only xas_store() site that
deserves special error handling. I was planning on suggesting to
sprinkle in a few VM_BUG_ON()'s after some of these sites, with a
comment, just in case the implementation of xarray changes and
operations which previously didn't require allocating memory now do
so. At least to me, it took work to sort it out, so I don't think it's
obvious.

Now, as mentioned, I'm headed on paternity leave starting Friday,
until July 12. So, if there is a v2, I'm likely to miss it, and even
Cc'ing me isn't likely to get a response :) As such, feel free to have
my

Tested-by: Zach O'Keefe <zokeefe@xxxxxxxxxx>

now, since I've validated it works. My understanding is that no other
callsites need attention, so I believe this bug is "fixed" -- all that
remains is dealing with the error codes, comments, assertions, etc.

Thanks again, Ivan,

Best,
Zach

[1] https://lore.kernel.org/linux-mm/000000000000226a6105f6954b47@xxxxxxxxxx/

> >
> > #undef EM
> > #undef EMe
> > _
> >
> >