Re: LTP hugemmap05 test case failure on arm64 with linux-next (next-20190613)

From: Mike Kravetz
Date: Thu Jun 27 2019 - 14:11:39 EST


On 6/24/19 2:53 PM, Mike Kravetz wrote:
> On 6/24/19 2:30 PM, Qian Cai wrote:
>> So the problem is that ipcget_public() has held the semaphore "ids->rwsem" for
>> too long seems unnecessarily and then goes to sleep sometimes due to direct
>> reclaim (other times LTP hugemmap05 [1] has hugetlb_file_setup() returns
>> -ENOMEM),
>
> Thanks for looking into this! I noticed that recent kernels could take a
> VERY long time trying to do high order allocations. In my case it was trying
> to do dynamic hugetlb page allocations as well [1]. But, IMO this is more
> of a general direct reclaim/compation issue than something hugetlb specific.
>

<snip>

>> Ideally, it seems only ipc_findkey() and newseg() in this path needs to hold the
>> semaphore to protect concurrency access, so it could just be converted to a
>> spinlock instead.
>
> I do not have enough experience with this ipc code to comment on your proposed
> change. But, I will look into it.
>
> [1] https://lkml.org/lkml/2019/4/23/2

I only took a quick look at the ipc code, but there does not appear to be
a quick/easy change to make. The issue is that shared memory creation could
take a long time. With issue [1] above unresolved, creation of hugetlb backed
shared memory segments could take a VERY long time.

I do not believe the test failure is arm specific. Most likely, it is just
because testing was done on a system with memory size to trigger this issue?

My plan is to focus on [1]. When that is resolved, this issue should go away.
--
Mike Kravetz