Re: [BUG] Lockless patches cause hardlock under heavy IO

From: Ryan Hope
Date: Tue Jun 24 2008 - 11:57:24 EST


I can give you a list of patches that should correspond to the thread
name (for the most part):

fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch

fix_munlock-page-table-walk.patch

migration_entry_wait-fix.patch

PATCH collect lru meminfo statistics from correct offset

Mlocked field of /proc/meminfo display silly number.
because trivial mistake exist in meminfo_read_proc().

You can also look in our git repo to see the code that changed with
these patches if you cant track them down in LKML:
http://zen-sources.org/cgi-bin/gitweb.cgi?p=kernel-mm.git;a=shortlog;h=refs/heads/lkml

On Tue, Jun 24, 2008 at 11:32 AM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Jun 24, 2008 at 11:12:03AM -0400, Ryan Hope wrote:
>> Well i tried to run pure -mm this weekend, it locked as soon as I got
>> into gnome so I applied a couple of the bug fixes from lkml and -mm
>> seems to be running stable now. I cant seem to get it to hard lock
>> now, at least not doing the simple stuff that was causing it to hard
>> lock on my other patchset, either the lockless patches expose some bug
>> that in -rc6 or lockless requires some other patches further up in the
>> -mm series file.
>
> Cool!!! Any guess as to which of the bug fixes did the trick?
> Failing that, a list of the bug fixes that you applied?
>
> Thanx, Paul
>
>> On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
>> > On Monday 23 June 2008 23:05, Paul E. McKenney wrote:
>> >> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote:
>> >> > On Monday 23 June 2008 13:51, Ryan Hope wrote:
>> >> > > well i get the hardlock on -mm with out using reiser4, i am pretty
>> >> > > sure is swap related
>> >> >
>> >> > The guys seeing hangs don't use PREEMPT_RCU, do they?
>> >> >
>> >> > In my swapping tests, I found -mm3 to be stable with classic RCU, but
>> >> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather
>> >> > quickly. First crash was in find_get_pages so I suspected lockless
>> >> > pagecache doing something subtly wrong with the RCU API, but I just got
>> >> > another crash in __d_lookup:
>> >>
>> >> Could you please send me a repeat-by? (At least Alexey is no longer
>> >> alone!)
>> >
>> > OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably
>> > important to reproduce it (but the fact that I'm reproducing oopses
>> > with << PAGE_SIZE objects like dentries and radix tree nodes indicates
>> > that there is even more free-before-grace activity going undetected --
>> > if you construct a test case using full pages, it might become even
>> > easier to detect with DEBUG_PAGEALLOC).
>> >
>> > 2 socket, 8 core x86 system.
>> >
>> > I mounted two tmpfs filesystems, one contains a single large file
>> > which is formatted as 1K block size ext3 and mounted loopback, the
>> > other is used directly. Linux kernel source is unpacked on each mount
>> > and concurrent make -j128 on each. This pushes it pretty hard into
>> > swap. Classic RCU survived another 5 hours of this last night.
>> >
>> > But that's a fairly convoluted test for an RCU problem. I expect it
>> > should be easier to trigger with something more targetted...
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/