Re: [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes

From: Andrey Ryabinin
Date: Mon Jan 15 2018 - 07:29:19 EST




On 01/13/2018 01:57 AM, Shakeel Butt wrote:
> On Fri, Jan 12, 2018 at 4:24 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>> On Fri 12-01-18 00:59:38, Andrey Ryabinin wrote:
>>> On 01/11/2018 07:29 PM, Michal Hocko wrote:
>> [...]
>>>> I do not think so. Consider that this reclaim races with other
>>>> reclaimers. Now you are reclaiming a large chunk so you might end up
>>>> reclaiming more than necessary. SWAP_CLUSTER_MAX would reduce the over
>>>> reclaim to be negligible.
>>>>
>>>
>>> I did consider this. And I think, I already explained that sort of race in previous email.
>>> Whether "Task B" is really a task in cgroup or it's actually a bunch of reclaimers,
>>> doesn't matter. That doesn't change anything.
>>
>> I would _really_ prefer two patches here. The first one removing the
>> hard coded reclaim count. That thing is just dubious at best. If you
>> _really_ think that the higher reclaim target is meaningfull then make
>> it a separate patch. I am not conviced but I will not nack it it either.
>> But it will make our life much easier if my over reclaim concern is
>> right and we will need to revert it. Conceptually those two changes are
>> independent anywa.
>>
>
> Personally I feel that the cgroup-v2 semantics are much cleaner for
> setting limit. There is no race with the allocators in the memcg,
> though oom-killer can be triggered. For cgroup-v1, the user does not
> expect OOM killer and EBUSY is expected on unsuccessful reclaim. How
> about we do something similar here and make sure oom killer can not be
> triggered for the given memcg?
>
> // pseudo code
> disable_oom(memcg)
> old = xchg(&memcg->memory.limit, requested_limit)
>
> reclaim memory until usage gets below new limit or retries are exhausted
>
> if (unsuccessful) {
> reset_limit(memcg, old)
> ret = EBUSY
> } else
> ret = 0;
> enable_oom(memcg)
>
> This way there is no race with the allocators and oom killer will not
> be triggered. The processes in the memcg can suffer but that should be
> within the expectation of the user. One disclaimer though, disabling
> oom for memcg needs more thought.

That's might be worse. If limit is too low, all allocations (except __GFP_NOFAIL of course) will start
failing. And the kernel not always careful enough in -ENOMEM handling.
Also, it's not much different from oom killing everything, the end result is almost the same -
nothing will work in that cgroup.


> Shakeel
>