Re: [v2 PATCH] fs/proc: task_mmu.c: don't read mapcount for migration entry
From: Jann Horn
Date: Wed Jan 26 2022 - 06:49:01 EST
On Wed, Jan 26, 2022 at 12:38 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
> On 26.01.22 12:29, Jann Horn wrote:
> > On Wed, Jan 26, 2022 at 11:51 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >> On 20.01.22 21:28, Yang Shi wrote:
> >>> The syzbot reported the below BUG:
> >>>
> >>> kernel BUG at include/linux/page-flags.h:785!
[...]
> >>> RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
> >>> RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
[...]
> >> Does this point at the bigger issue that reading the mapcount without
> >> having the page locked is completely unstable?
> >
> > (See also https://lore.kernel.org/all/CAG48ez0M=iwJu=Q8yUQHD-+eZDg6ZF8QCF86Sb=CN1petP=Y0Q@xxxxxxxxxxxxxx/
> > for context.)
>
> Thanks for the pointer.
>
> >
> > I'm not sure what you mean by "unstable". Do you mean "the result is
> > not guaranteed to still be valid when the call returns", "the result
> > might not have ever been valid", or "the call might crash because the
> > page's state as a compound page is unstable"?
>
> A little bit of everything :)
[...]
> > In case you mean "the result might not have ever been valid":
> > Yes, even with this patch applied, in theory concurrent THP splits
> > could cause us to count some page mappings twice. Arguably that's not
> > entirely correct.
>
> Yes, the snapshot is not atomic and, thereby, unreliable. That what I
> mostly meant as "unstable".
>
> >
> > In case you mean "the call might crash because the page's state as a
> > compound page could concurrently change":
>
> I think that's just a side-product of the snapshot not being "correct",
> right?
I guess you could see it that way? The way I look at it is that
page_mapcount() is designed to return a number that's at least as high
as the number of mappings (rarely higher due to races), and using
page_mapcount() on an unlocked page is legitimate if you're fine with
the rare double-counting of references. In my view, the problem here
is:
There are different types of references to "struct page" - some of
them allow you to call page_mapcount(), some don't. And in particular,
get_page() doesn't give you a reference that can be used with
page_mapcount(), but locking a (real, non-migration) PTE pointing to
the page does give you such a reference.
This concept of "different types of references" is the same as you
e.g. get with mmgrab() vs mmget() - they both give references to the
same object, but those references have different usage restrictions.