Re: [RFC] mm: kmemleak: replace __GFP_NOFAIL to GFP_NOWAIT in gfp_kmemleak_mask

From: Dmitry Vyukov
Date: Tue Apr 24 2018 - 13:16:28 EST


On Tue, Apr 24, 2018 at 7:02 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> On Tue 24-04-18 12:48:50, Chunyu Hu wrote:
>>
>>
>> ----- Original Message -----
>> > From: "Michal Hocko" <mhocko@xxxxxxxxxx>
>> > To: "Chunyu Hu" <chuhu.ncepu@xxxxxxxxx>
>> > Cc: "Dmitry Vyukov" <dvyukov@xxxxxxxxxx>, "Catalin Marinas" <catalin.marinas@xxxxxxx>, "Chunyu Hu"
>> > <chuhu@xxxxxxxxxx>, "LKML" <linux-kernel@xxxxxxxxxxxxxxx>, "Linux-MM" <linux-mm@xxxxxxxxx>
>> > Sent: Tuesday, April 24, 2018 9:20:57 PM
>> > Subject: Re: [RFC] mm: kmemleak: replace __GFP_NOFAIL to GFP_NOWAIT in gfp_kmemleak_mask
>> >
>> > On Mon 23-04-18 12:17:32, Chunyu Hu wrote:
>> > [...]
>> > > So if there is a new flag, it would be the 25th bits.
>> >
>> > No new flags please. Can you simply store a simple bool into fail_page_alloc
>> > and have save/restore api for that?
>>
>> Hi Michal,
>>
>> I still don't get your point. The original NOFAIL added in kmemleak was
>> for skipping fault injection in page/slab allocation for kmemleak object,
>> since kmemleak will disable itself until next reboot, whenever it hit an
>> allocation failure, in that case, it will lose effect to check kmemleak
>> in errer path rose by fault injection. But NOFAULT's effect is more than
>> skipping fault injection, it's also for hard allocation. So a dedicated flag
>> for skipping fault injection in specified slab/page allocation was mentioned.
>
> I am not familiar with the kmemleak all that much, but fiddling with the
> gfp_mask is a wrong way to achieve kmemleak specific action. I might be

I would say this is more like slab fault injection-specific action. It
can be used in other debugging facilities. Slab fault injection is a
part of slab. Slab behavior is generally controlled with gfp_mask.

> easilly wrong but I do not see any code that would restore the original
> gfp_mask down the kmem_cache_alloc path.
>
>> d9570ee3bd1d ("kmemleak: allow to coexist with fault injection")
>>
>> Do you mean something like below, with the save/store api? But looks like
>> to make it possible to skip a specified allocation, not global disabling,
>> a bool is not enough, and a gfp_flag is also needed. Maybe I missed something?
>
> Yes, this is essentially what I meant. It is still a global thing which
> is not all that great and if it matters then you can make it per
> task_struct. That really depends on the code flow here.

If we go this route, it definitely needs to be per task and also needs
to work with interrupts: switch on interrupts and not corrupt on
interrupts. A gfp flag is free of these problems.