Re: [RFC PATCH] Add proc interface to set PF_MEMALLOC flags

From: Mike Christie
Date: Wed Sep 11 2019 - 11:44:43 EST


On 09/11/2019 05:07 AM, Tetsuo Handa wrote:
> On 2019/09/11 12:13, Hillf Danton wrote:
>>
>> On Tue, 10 Sep 2019 11:06:03 -0500 From: Mike Christie <mchristi@xxxxxxxxxx>
>>>
>>>> Really? Without any privilege check? So any random user can tap into
>>>> __GFP_NOIO allocations?
>>>
>>> That was a mistake on my part. I will add it in.
>>>
>> You may alternatively madvise a nutcracker as long as you would have
>> added a sledgehammer under /proc instead of a gavel.
>>
>> --- a/include/uapi/asm-generic/mman-common.h
>> +++ b/include/uapi/asm-generic/mman-common.h
>> @@ -45,6 +45,7 @@
>> #define MADV_SEQUENTIAL 2 /* expect sequential page references */
>> #define MADV_WILLNEED 3 /* will need these pages */
>> #define MADV_DONTNEED 4 /* don't need these pages */
>> +#define MADV_NOIO 5 /* set PF_MEMALLOC_NOIO */
>>
>> /* common parameters: try to keep these consistent across architectures */
>> #define MADV_FREE 8 /* free pages only if memory pressure */
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -716,6 +716,7 @@ madvise_behavior_valid(int behavior)
>> case MADV_WILLNEED:
>> case MADV_DONTNEED:
>> case MADV_FREE:
>> + case MADV_NOIO:
>> #ifdef CONFIG_KSM
>> case MADV_MERGEABLE:
>> case MADV_UNMERGEABLE:
>> @@ -813,6 +814,11 @@ SYSCALL_DEFINE3(madvise, unsigned long,
>> if (!madvise_behavior_valid(behavior))
>> return error;
>>
>> + if (behavior == MADV_NOIO) {
>> + current->flags |= PF_MEMALLOC_NOIO;
>
> Yes, for "modifying p->flags when p != current" is not permitted.
>
> But I guess that there is a problem. Setting PF_MEMALLOC_NOIO causes
> current_gfp_context() to mask __GFP_IO | __GFP_FS, but the OOM killer cannot
> be invoked when __GFP_FS is masked. As a result, any userspace thread which
> has PF_MEMALLOC_NOIO cannot invoke the OOM killer. If the userspace thread
> which uses PF_MEMALLOC_NOIO is involved in memory reclaiming activities,
> the memory reclaiming activities won't be able to make forward progress when
> the userspace thread triggered e.g. a page fault. Can the "userspace components
> that can run in the IO path" survive without any memory allocation?
>

Yes and no, when they can they will have preallocated the resources they
need to make forward progress similar to how kernel storage drivers do.
However for some resources, like in the network layer, both userspace
and kernel drivers are not able to preallocate and may fail.


>> + return 0;
>> + }
>> +
>> if (start & ~PAGE_MASK)
>> return error;
>> len = (len_in + ~PAGE_MASK) & PAGE_MASK;
>