Re: mm: kernel BUG at mm/memory.c:1230

From: Andrea Arcangeli
Date: Sat May 26 2012 - 19:56:35 EST


Hello everyone,

On Sat, May 26, 2012 at 01:26:48PM -0700, Hugh Dickins wrote:
> I've been round this loop before with that particular VM_BUG_ON.
>
> At first I thought like Andrew, that it's glaringly wrong on the exit
> path; but then changed my mind.
>
> When munmapping, we certainly can arrive here with an unaligned addr
> and next; but in that case rwsem_is_locked.
>
> Whereas in exiting, rwsem is not locked, but we're going linearly upwards,
> and whenever we walk into a pmd_trans_huge area, both addr and next should
> be hpage aligned: the vma bounds are unsuited to THP if they're unaligned.
>
> Other cases equally should not arise: madvise MADV_DONTNEED should
> have rwsem_is_locked; and truncation or hole-punching shouldn't be
> possible on a pure-anonymous (!vma->vm_ops) area considered for THP.
>
> But I cannot remember what brought me here before: a crash in testing
> on one of my machines, which further investigation root-caused elsewhere?
> or a report from someone else? or noticed when auditing another problem?
> I'm frustrated not to recall.

I agree it's not a false positive.

The reason I introduced that VM_BUG_ON was to verify if any
vma_adjust_trans_huge() was missing anywhere (so that it doesn't crash
later in split_huge_page with an obscure mapcount != page_mapcount
BUG_ON, there it would be much less obvious to see why it crashed than
here).

We should printk addr, end and the vma->vm_start/vm_end to debug this
further.

> > I'm not sure if that's indeed the issue or not, but note that this is
> > the first time I've managed to trigger that with the fuzzer, and it's
> > not that easy to reproduce. Which is a bit odd for code that was there
> > for 4 months...
>
> I'm keeping off the linux-next for the moment; I'll worry about this
> more if it shows up when we try 3.5-rc1. Your fuzzing tells that my
> logic above is wrong, but maybe it's just a passing defect in next.

If it's a missing vma_adjust_trans_huge() it shouldn't go unnoticed
even with DEBUG_VM=n, so I agree that if it only happens on linux-next
it's worth trying to reproduce it with 3.5-rc/3.4 too just in
case. It's actually the first time I hear of this bugcheck triggering.

Thanks!
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/