Re: mm: BUG in khugepaged_scan_mm_slot

From: Andrea Arcangeli
Date: Mon Apr 04 2016 - 13:15:32 EST


Hello,

On Mon, Apr 04, 2016 at 03:06:25PM +0300, Kirill A. Shutemov wrote:
> On Mon, Apr 04, 2016 at 02:03:54PM +0200, Vlastimil Babka wrote:
> > [+CC Andrea]
> >
> > On 04/02/2016 11:48 AM, Dmitry Vyukov wrote:
> > >Hello,
> > >
> > >The following program triggers a BUG in khugepaged_scan_mm_slot:
> > >
> > >
> > >vma ffff880032698f90 start 0000000020c57000 end 0000000020c58000
> > >next ffff88003269a1b8 prev ffff88003269ac18 mm ffff88005e274780
> > >prot 35 anon_vma ffff88003182c000 vm_ops (null)
> > >pgoff fed00 file ffff8800324552c0 private_data (null)
> > >flags: 0x5144477(read|write|exec|mayread|maywrite|mayexec|pfnmap|io|dontexpand|account)
> > >------------[ cut here ]------------
> > >kernel BUG at mm/huge_memory.c:2313!
> > >invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> >
> > That's VM_BUG_ON_VMA(vma->vm_flags & VM_NO_THP, vma) in
> > hugepage_vma_check().
> >
> > #define VM_NO_THP (VM_SPECIAL | VM_HUGETLB | VM_SHARED | VM_MAYSHARE)
> >
> > #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
> >
> > Of those, we have VM_IO | VM_DONTEXPAND.
> >
> > I don't know if it's valid for a vma with anon_vma to have such flags, if
> > yes, we should probably modify hugepage_vma_check(). Called from
> > khugepaged_scan_mm_slot() it should just return false out VM_NO_THP. Called
> > from collapse_huge_page() it could keep the VM_BUG_ON. Or maybe just have
> > VM_BUG_ON(!hugepage_vma_check()) there? Hmm actually no, there's a mmap_sem
> > release for read and then acquire for write, so we can't rely on the check
> > done earlier from khugepaged_scan_mm_slot().
> >
> > So we should probably just change the VM_BUG_ON to another "return false"
> > condition. Unless the VM_BUG_ON uncovered a real bug and the earlier
> > conditions in hugepage_vma_check() should guarantee the VM_BUG_ON be false
> > for any vma.
>
> 145961146490.28194.16019687861681349309.stgit@zurg">http://lkml.kernel.org/r/145961146490.28194.16019687861681349309.stgit@zurg

That's not the only place that assumes vm_ops NULL means anonymous and
not VM_IO though, so I agree with Vlastimil we should think once more
about this fix, either that or extend it to other places.

I wonder if perhaps there was a mistake in checking vm_ops in the
first place and leaving the vm_ops check isn't the right fix. Wouldn't
it be more correct to apply a s/!vm_ops/!vm_file/ and not just there?
What problem would then we run into if we used !vm_file?

The assumption in this vm_ops check is that it was safer to a vm_file
check but clearly it isn't as some chardev is not setting vm_ops
(don't they need to vm_ops->close?). But all chardevs have vm_file
set, so if we could use that instead, we can retain the VM_BUG_ON or
better convert it to a graceful warn on that bails out.

Thanks,
Andrea