Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)
From: Andrew Lutomirski
Date: Tue May 24 2011 - 07:55:48 EST
On Tue, May 24, 2011 at 7:24 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> On Mon, May 23, 2011 at 9:34 PM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>> On Tue, May 24, 2011 at 10:19 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>>> On Sun, May 22, 2011 at 7:12 PM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>>>> Could you test below patch based on vanilla 2.6.38.6?
>>>> The expect result is that system hang never should happen.
>>>> I hope this is last test about hang.
>>>>
>>>> Thanks.
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 292582c..1663d24 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>> if (scanned == 0)
>>>> scanned = SWAP_CLUSTER_MAX;
>>>>
>>>> - if (!down_read_trylock(&shrinker_rwsem))
>>>> - return 1; /* Assume we'll be able to shrink next time */
>>>> + if (!down_read_trylock(&shrinker_rwsem)) {
>>>> + /* Assume we'll be able to shrink next time */
>>>> + ret = 1;
>>>> + goto out;
>>>> + }
>>>>
>>>> list_for_each_entry(shrinker, &shrinker_list, list) {
>>>> unsigned long long delta;
>>>> @@ -286,6 +289,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
>>>> shrinker->nr += total_scan;
>>>> }
>>>> up_read(&shrinker_rwsem);
>>>> +out:
>>>> + cond_resched();
>>>> return ret;
>>>> }
>>>>
>>>> @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
>>>> *pgdat, int order, long remaining,
>>>> * must be balanced
>>>> */
>>>> if (order)
>>>> - return pgdat_balanced(pgdat, balanced, classzone_idx);
>>>> + return !pgdat_balanced(pgdat, balanced, classzone_idx);
>>>> else
>>>> return !all_zones_ok;
>>>> }
>>>
>>> So far with this patch I can't reproduce the hang or the bogus OOM.
>>>
>>> To be completely clear, I have COMPACTION, MIGRATION, and THP off, I'm
>>> running 2.6.38.6, and I have exactly two patches applied. One is the
>>> attached patch and the other is a the fpu.ko/aesni_intel.ko merger
>>> which I need to get dracut to boot my box.
>>>
>>> For fun, I also upgraded to 8GB of RAM and it still works.
>>>
>>
>> Hmm. Could you test it with enable thp and 2G RAM?
>> Isn't it a original test environment?
>> Please don't change test environment. :)
>
> The test that passed last night was an environment (hardware and
> config) that I had confirmed earlier as failing without the patch.
>
> I just re-tested my original config (from a backup -- migration,
> compaction, and thp "always" are enabled). I get bogus OOMs but not a
> hang. (I'm running with mem=2G right now -- I'll swap the DIMMs back
> out later on if you want.)
>
> I attached the bogus OOM (actually several that happened in sequence).
> They look readahead-related. There was plenty of free swap space.
Now with log actually attached.
>
> --Andy
>
Attachment:
bogus_oom.txt.xz
Description: application/xz