Re: getting oom/stalls for ltp test cpuset01 with latest/4.9 kernel

From: Vlastimil Babka
Date: Fri Jan 13 2017 - 04:06:27 EST


On 01/13/2017 05:35 AM, Ganapatrao Kulkarni wrote:
> On Thu, Jan 12, 2017 at 4:40 PM, Vlastimil Babka <vbabka@xxxxxxx> wrote:
>> On 01/11/2017 05:46 PM, Michal Hocko wrote:
>>>
>>> On Wed 11-01-17 21:52:29, Ganapatrao Kulkarni wrote:
>>>
>>>> [ 2398.169391] Node 1 Normal: 951*4kB (UME) 1308*8kB (UME) 1034*16kB
>>>> (UME) 742*32kB (UME) 581*64kB (UME) 450*128kB (UME) 362*256kB (UME)
>>>> 275*512kB (ME) 189*1024kB (UM) 117*2048kB (ME) 2742*4096kB (M) = 12047196kB
>>>
>>>
>>> Most of the memblocks are marked Unmovable (except for the 4MB bloks)
>>
>>
>> No, UME here means that e.g. 4kB blocks are available on unmovable, movable
>> and reclaimable lists.
>>
>>> which shouldn't matter because we can fallback to unmovable blocks for
>>> movable allocation AFAIR so we shouldn't really fail the request. I
>>> really fail to see what is going on there but it smells really
>>> suspicious.
>>
>>
>> Perhaps there's something wrong with zonelists and we are skipping the Node
>> 1 Normal zone. Or there's some race with cpuset operations (but can't see
>> how).
>>
>> The question is, how reproducible is this? And what exactly the test
>> cpuset01 does? Is it doing multiple things in a loop that could be reduced
>> to a single testcase?
>
> IIUC, this test does node change to cpuset.mems in loop in parent
> process in loop and child processes(equal to no of cpus) keeps on
> allocation and freeing
> 10 pages till the execution time is over.
> more details at
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/cpuset/cpuset01.c

Ah, thanks for explaining. Looks like there might be a race where determining
ac.preferred_zone using current_mems_allowed as ac.nodemask skips the only zone
that is allowed after the cpuset.mems update, and we only recalculate
ac.preferred_zone for allocations that are allowed to escape cpusets/watermarks.
Thus we see only part of the zonelist, missing the only allowed zone. This would
be due to commit 682a3385e773 ("mm, page_alloc: inline the fast path of the
zonelist iterator") and/or some others from that series.

Could you try with the following patch please? It also tries to protect from
race with last non-root cpuset removal, which could cause cpusets_enable() to
become false in the middle of the function.

----8<----