Re: mm: BUG in unmap_page_range

From: Mel Gorman
Date: Tue Sep 09 2014 - 17:33:29 EST


On Mon, Sep 08, 2014 at 01:56:55PM -0400, Sasha Levin wrote:
> On 09/08/2014 01:18 PM, Mel Gorman wrote:
> > A worse possibility is that somehow the lock is getting corrupted but
> > that's also a tough sell considering that the locks should be allocated
> > from a dedicated cache. I guess I could try breaking that to allocate
> > one page per lock so DEBUG_PAGEALLOC triggers but I'm not very
> > optimistic.
>
> I did see ptl corruption couple days ago:
>
> https://lkml.org/lkml/2014/9/4/599
>
> Could this be related?
>

Possibly although the likely explanation then would be that there is
just general corruption coming from somewhere. Even using your config
and applying a patch to make linux-next boot (already in Tejun's tree)
I was unable to reproduce the problem after running for several hours. I
had to run trinity on tmpfs as ext4 and xfs blew up almost immediately
so I have a few questions.

1. What filesystem are you using?

2. What compiler in case it's an experimental compiler? I ask because I
think I saw a patch from you adding support so that the kernel would
build with gcc 5

3. Does your hardware support TSX or anything similarly funky that would
potentially affect locking?

4. How many sockets are on your test machine in case reproducing it
depends in a machine large enough to open a timing race?

As I'm drawing a blank on what would trigger the bug I'm hoping I can
reproduce this locally and experiement a bit.

Thanks.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/