Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

From: Minchan Kim
Date: Wed May 18 2011 - 01:17:23 EST


On Wed, May 18, 2011 at 4:22 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> On Tue, May 17, 2011 at 2:00 AM, Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
>> On Sun, May 15, 2011 at 12:12:36PM -0400, Andrew Lutomirski wrote:
>>> On Sun, May 15, 2011 at 11:27 AM, Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
>>>
>>> That was probably because one of my testcases creates a 1.4GB file on
>>> ramfs. Â(I can provoke the problem without doing evil things like
>>> that, but the test script is rather reliable at killing my system and
>>> it works fine on my other machines.)
>>
>> Ah I didn't read your first email.. I'm now running
>>
>> ./test_mempressure.sh 1500 1400 1
>>
>> with mem=2G and no swap, but cannot reproduce OOM.
>
> Do you have a Sandy Bridge laptop? ÂThere was a recent thread on lkml
> suggesting that only Sandy Bridge laptops saw this problem. ÂAlthough
> there's something else needed to trigger it, because I can't do it
> from an initramfs I made that tried to show this problem.
>
>>
>> What's your kconfig?
>
> Attached. ÂThis is 2.6.38.6.
>
>>
>>> If you want, I can try to generate a trace that isn't polluted with
>>> the evil ramfs file.
>>
>> No, thanks. However it would be valuable if you can retry with this
>> patch _alone_ (without the "if (need_resched()) return false;" change,
>> as I don't see how it helps your case).
>>
>> @@ -2286,7 +2290,7 @@ static bool sleeping_prematurely(pg_data_t
>> *pgdat, int order, long remaining,
>> Â Â Â Â* must be balanced
>> Â Â Â Â*/
>> Â Â Â if (order)
>> - Â Â Â Â Â Â Â return pgdat_balanced(pgdat, balanced, classzone_idx);
>> + Â Â Â Â Â Â Â return !pgdat_balanced(pgdat, balanced, classzone_idx);
>> Â Â Â else
>> Â Â Â Â Â Â Â return !all_zones_ok;
>> Â}
>
> Done.
>
> I logged in, added swap, and ran a program that allocated 1900MB of
> RAM and memset it. ÂThe system lagged a bit but survived. Âkswapd
> showed 10% CPU (which is odd, IMO, since I'm using aesni-intel and I
> think that all the crypt happens in kworker when aesni-intel is in
> use).

I think kswapd could use 10% enough for reclaim.

>
> Then I started Firefox, loaded gmail, and ran test_mempressure.sh.
> Kaboom! Â(I.e. system was hung) ÂSysRq-F saved the system and produced

Hang?
It means you see softhangup of kswapd? or mouse/keyboard doesn't move?

> the attached dump. ÂI had 6GB swap available, so there shouldn't have
> been any OOM.

Yes. It's strange but we have seen such case several times, AFAIR.

Let see your first OOM message.
(Intentionally, I don't inline OOM message as Web Gmail mangles it and
whoever see it is very annoying.)

If it consider min/low/high of zones, any zones can't meet your
allocation request. (order-0, GFP_WAIT|IO|FS|HIGHMEM). So the result
is natural.
But thing I wonder is that we have lots of free swap space as you said.
Why doesn't VM swap out anon pages of DMA32 zone and then happen OOM?

We are going to isolate anon pages of DMA32 as log said(ie,
isolated(anon):408kB)
So I think VM is going on rightly.
The thing is task speed of request allocation is faster than swapout's
speed. So swap device is very congested and most of swapout pages
would remain PG_writeback. In the end, shrink_page_list returns 0.

In high-order page reclaim, we can adjust task's speed by should_reclaim_stall.
But for order-0 page, should_reclaim_stall returns _false_ and at last
we can see OOM message although swap has lots of free space.
Does my guessing make sense?
If it is, does it make sense that OOM happens despite we have lots of
swap space in case of order-0?
How about this?

Andrew, Could you test this patch with !pgdat_balanced patch?
I think we shouldn't see OOM message if we have lots of free swap space.

== CUT_HERE ==
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f73b865..cc23f04 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1341,10 +1341,6 @@ static inline bool
should_reclaim_stall(unsigned long nr_taken,
if (current_is_kswapd())
return false;

- /* Only stall on lumpy reclaim */
- if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
- return false;
-
/* If we have relaimed everything on the isolated list, no stall */
if (nr_freed == nr_taken)
return false;



Then, if you don't see any unnecessary OOM but still see the hangup,
could you apply this patch based on previous?

== CUT_HERE ==

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f73b865..703380f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2697,6 +2697,7 @@ static int kswapd(void *p)
if (!ret) {
trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
order = balance_pgdat(pgdat, order, &classzone_idx);
+ cond_resched();
}
}
return 0;

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/