Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

From: Andrew Lutomirski
Date: Wed May 18 2011 - 22:41:25 EST


On Wed, May 18, 2011 at 10:30 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Wed, 18 May 2011 22:15:53 -0400
> Andrew Lutomirski <luto@xxxxxxx> wrote:
>
>> On Wed, May 18, 2011 at 1:17 AM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>> > On Wed, May 18, 2011 at 4:22 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>
>> > Andrew, Could you test this patch with !pgdat_balanced patch?
>> > I think we shouldn't see OOM message if we have lots of free swap space.
>> >
>> > == CUT_HERE ==
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index f73b865..cc23f04 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1341,10 +1341,6 @@ static inline bool
>> > should_reclaim_stall(unsigned long nr_taken,
>> >        if (current_is_kswapd())
>> >                return false;
>> >
>> > -       /* Only stall on lumpy reclaim */
>> > -       if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
>> > -               return false;
>> > -
>> >        /* If we have relaimed everything on the isolated list, no stall */
>> >        if (nr_freed == nr_taken)
>> >                return false;
>> >
>> >
>> >
>> > Then, if you don't see any unnecessary OOM but still see the hangup,
>> > could you apply this patch based on previous?
>>
>> With this patch, I started GNOME and Firefox, turned on swap, and ran
>> test_mempressure.sh 1500 1400 1.  Instant panic (or OOPS and hang or
>> something -- didn't get the top part).  Picture attached -- it looks
>> like memcg might be involved.  I'm running F15, so it might even be
>> doing something.
>>
>
> Hmm, what kernel version do you use ?
> I think memcg is not guilty because RIP is shrink_page_list().
> But ok, I'll dig this. Could you give us your .config ?

Attached.

The address in shrink_page_list is ud2, from (I think)
VM_BUG_ON(PageActive(page));. The sequence is:

0xffffffff810d24cc <+202>: callq 0xffffffff810cf930 <test_and_set_bit>
0xffffffff810d24d1 <+207>: test %eax,%eax
0xffffffff810d24d3 <+209>: jne 0xffffffff810d2aa5 <shrink_page_list+1699>
0xffffffff810d24d9 <+215>: mov -0x28(%rbx),%rax
0xffffffff810d24dd <+219>: test $0x40,%al
0xffffffff810d24df <+221>: je 0xffffffff810d24e3 <shrink_page_list+225>
0xffffffff810d24e1 <+223>: ud2


--Andy

Attachment: .config
Description: Binary data