Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim
From: Greg Thelen
Date: Mon Jun 09 2014 - 18:53:02 EST
On Fri, Jun 06 2014, Michal Hocko <mhocko@xxxxxxx> wrote:
> Some users (e.g. Google) would like to have stronger semantic than low
> limit offers currently. The fallback mode is not desirable and they
> prefer hitting OOM killer rather than ignoring low limit for protected
> groups. There are other possible usecases which can benefit from hard
> guarantees. I can imagine workloads where setting low_limit to the same
> value as hard_limit to prevent from any reclaim at all makes a lot of
> sense because reclaim is much more disrupting than restart of the load.
>
> This patch adds a new per memcg memory.reclaim_strategy knob which
> tells what to do in a situation when memory reclaim cannot do any
> progress because all groups in the reclaimed hierarchy are within their
> low_limit. There are two options available:
> - low_limit_best_effort - the current mode when reclaim falls
> back to the even reclaim of all groups in the reclaimed
> hierarchy
> - low_limit_guarantee - groups within low_limit are never
> reclaimed and OOM killer is triggered instead. OOM message
> will mention the fact that the OOM was triggered due to
> low_limit reclaim protection.
To (a) be consistent with existing hard and soft limits APIs and (b)
allow use of both best effort and guarantee memory limits, I wonder if
it's best to offer three per memcg limits, rather than two limits (hard,
low_limit) and a related reclaim_strategy knob. The three limits I'm
thinking about are:
1) hard_limit (aka the existing limit_in_bytes cgroupfs file). No
change needed here. This is an upper bound on a memcg hierarchy's
memory consumption (assuming use_hierarchy=1).
2) best_effort_limit (aka desired working set). This allow an
application or administrator to provide a hint to the kernel about
desired working set size. Before oom'ing the kernel is allowed to
reclaim below this limit. I think the current soft_limit_in_bytes
claims to provide this. If we prefer to deprecate
soft_limit_in_bytes, then a new desired_working_set_in_bytes (or a
hopefully better named) API seems reasonable.
3) low_limit_guarantee which is a lower bound of memory usage. A memcg
would prefer to be oom killed rather than operate below this
threshold. Default value is zero to preserve compatibility with
existing apps.
Logically hard_limit >= best_effort_limit >= low_limit_guarantee.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/