Re: [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes

From: Shakeel Butt
Date: Mon Jan 15 2018 - 12:05:08 EST


On Mon, Jan 15, 2018 at 4:29 AM, Andrey Ryabinin
<aryabinin@xxxxxxxxxxxxx> wrote:
>
>
> On 01/13/2018 01:57 AM, Shakeel Butt wrote:
>> On Fri, Jan 12, 2018 at 4:24 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>>> On Fri 12-01-18 00:59:38, Andrey Ryabinin wrote:
>>>> On 01/11/2018 07:29 PM, Michal Hocko wrote:
>>> [...]
>>>>> I do not think so. Consider that this reclaim races with other
>>>>> reclaimers. Now you are reclaiming a large chunk so you might end up
>>>>> reclaiming more than necessary. SWAP_CLUSTER_MAX would reduce the over
>>>>> reclaim to be negligible.
>>>>>
>>>>
>>>> I did consider this. And I think, I already explained that sort of race in previous email.
>>>> Whether "Task B" is really a task in cgroup or it's actually a bunch of reclaimers,
>>>> doesn't matter. That doesn't change anything.
>>>
>>> I would _really_ prefer two patches here. The first one removing the
>>> hard coded reclaim count. That thing is just dubious at best. If you
>>> _really_ think that the higher reclaim target is meaningfull then make
>>> it a separate patch. I am not conviced but I will not nack it it either.
>>> But it will make our life much easier if my over reclaim concern is
>>> right and we will need to revert it. Conceptually those two changes are
>>> independent anywa.
>>>
>>
>> Personally I feel that the cgroup-v2 semantics are much cleaner for
>> setting limit. There is no race with the allocators in the memcg,
>> though oom-killer can be triggered. For cgroup-v1, the user does not
>> expect OOM killer and EBUSY is expected on unsuccessful reclaim. How
>> about we do something similar here and make sure oom killer can not be
>> triggered for the given memcg?
>>
>> // pseudo code
>> disable_oom(memcg)
>> old = xchg(&memcg->memory.limit, requested_limit)
>>
>> reclaim memory until usage gets below new limit or retries are exhausted
>>
>> if (unsuccessful) {
>> reset_limit(memcg, old)
>> ret = EBUSY
>> } else
>> ret = 0;
>> enable_oom(memcg)
>>
>> This way there is no race with the allocators and oom killer will not
>> be triggered. The processes in the memcg can suffer but that should be
>> within the expectation of the user. One disclaimer though, disabling
>> oom for memcg needs more thought.
>
> That's might be worse. If limit is too low, all allocations (except __GFP_NOFAIL of course) will start
> failing. And the kernel not always careful enough in -ENOMEM handling.
> Also, it's not much different from oom killing everything, the end result is almost the same -
> nothing will work in that cgroup.
>

By disabling memcg oom, I meant to treat all allocations from that
memcg as __GFP_NOFAIL until the oom is disabled. I will see if I can
convert this into an actual code.

>
>> Shakeel
>>