Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

From: Minchan Kim
Date: Thu May 19 2011 - 20:17:20 EST


On Thu, May 19, 2011 at 11:16 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> I just booted 2.6.38.6 with exactly two patches applied. ÂConfig was
> the same as I emailed yesterday. ÂUserspace is F15. ÂFirst was
> "aesni-intel: Merge with fpu.ko" because dracut fails to boot my
> system without it. ÂSecond was this (sorry for whitespace damage):
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0665520..3f44b81 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -307,7 +307,7 @@ static void set_reclaim_mode(int priority, struct
> scan_control *sc,
> Â Â Â Â */
> Â Â Â Âif (sc->order > PAGE_ALLOC_COSTLY_ORDER)
> Â Â Â Â Â Â Â Âsc->reclaim_mode |= syncmode;
> - Â Â Â else if (sc->order && priority < DEF_PRIORITY - 2)
> + Â Â Â else if ((sc->order && priority < DEF_PRIORITY - 2) ||
> priority <= DEF_PRIORITY / 3)
> Â Â Â Â Â Â Â Âsc->reclaim_mode |= syncmode;
> Â Â Â Âelse
> Â Â Â Â Â Â Â Âsc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
> @@ -1342,10 +1342,6 @@ static inline bool
> should_reclaim_stall(unsigned long nr_taken,
> Â Â Â Âif (current_is_kswapd())
> Â Â Â Â Â Â Â Âreturn false;
>
> - Â Â Â /* Only stall on lumpy reclaim */
> - Â Â Â if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
> - Â Â Â Â Â Â Â return false;
> -
> Â Â Â Â/* If we have relaimed everything on the isolated list, no stall */
> Â Â Â Âif (nr_freed == nr_taken)
> Â Â Â Â Â Â Â Âreturn false;
>
> I started GNOME and Firefox, enabled swap, and ran test_mempressure.sh
> 1500 1400 1. ÂThe system quickly gave the attached oops.
>
> The oops was the ud2 here:
>
>  0xffffffff810d251b <+215>:  mov  Â-0x28(%rbx),%rax
>  0xffffffff810d251f <+219>:  test  $0x40,%al
>  0xffffffff810d2521 <+221>:  je   0xffffffff810d2525 <shrink_page_list+225>
> Â 0xffffffff810d2523 <+223>: Â ud2
>
> Please let me know what the next test to run is.

Okay. My first patch(!pgdat_balanced and cond_resched right after
balance_pgdat) sent you was successful. But the version removed
cond_resched was hang.

Let's not make the problem complex.
So let's put aside the above my patch.

Would you be willing to test one more with below patch?
(Of course, it would be damage by white space. I can't do anything for
it in my office. Sorry.)
If below patch still fix your problem like my first patch, we will
push this patch into mainline.

Thanks. Andrew.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 292582c..1663d24 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
if (scanned == 0)
scanned = SWAP_CLUSTER_MAX;

- if (!down_read_trylock(&shrinker_rwsem))
- return 1; /* Assume we'll be able to shrink next time */
+ if (!down_read_trylock(&shrinker_rwsem)) {
+ /* Assume we'll be able to shrink next time */
+ ret = 1;
+ goto out;
+ }

list_for_each_entry(shrinker, &shrinker_list, list) {
unsigned long long delta;
@@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
shrinker->nr += total_scan;
}
up_read(&shrinker_rwsem);
+out:
+ cond_resched();
return ret;
}

@@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
*pgdat, int order, long remaining,
* must be balanced
*/
if (order)
- return pgdat_balanced(pgdat, balanced, classzone_idx);
+ return !pgdat_balanced(pgdat, balanced, classzone_idx);
else
return !all_zones_ok;
}



>
> --Andy
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/