Re: [patch 0/4] mm: memcontrol: populate unified hierarchy interface

From: Michal Hocko
Date: Tue Aug 05 2014 - 11:27:47 EST


On Tue 05-08-14 09:53:25, Johannes Weiner wrote:
> On Tue, Aug 05, 2014 at 02:40:33PM +0200, Michal Hocko wrote:
> > On Mon 04-08-14 17:14:53, Johannes Weiner wrote:
> > > Hi,
> > >
> > > the ongoing versioning of the cgroup user interface gives us a chance
> > > to clean up the memcg control interface and fix a lot of
> > > inconsistencies and ugliness that crept in over time.
> >
> > The first patch doesn't fit into the series and should be posted
> > separately.
>
> It's a prerequisite for the high limit implementation.

I do not think it is strictly needed. I am even not sure whether the
patch is OK and have to think more about it. I think you can throttle
high limit breachers by SWAP_CLUSTER_MAX for now.

> > > This series adds a minimal set of control files to the new memcg
> > > interface to get basic memcg functionality going in unified hierarchy:
> >
> > Hmm, I have posted RFC for new knobs quite some time ago and the
> > discussion died without some questions answered and now you are coming
> > with a new one. I cannot say I would be happy about that.
>
> I remembered open questions mainly about other things like swappiness,
> charge immigration, kmem limits. My bad, I should have checked. Here
> are your concerns on these basic knobs from that email:
>
> ---
>
> On Thu, Jul 17, 2014 at 03:45:09PM +0200, Michal Hocko wrote:
> > On Wed 16-07-14 11:58:14, Johannes Weiner wrote:
> > > How about "memory.current"?
> >
> > I wanted old users to change the minimum possible when moving to unified
> > hierarchy so I didn't touch the old names.
> > Why should we make the end users life harder? If there is general
> > agreement I have no problem with renaming I just do not think it is
> > really necessary because there is no real reason why configurations
> > which do not use any of the deprecated or unified-hierarchy-only
> > features shouldn't run in both unified and legacy hierarchies without
> > any changes.
>
> There is the rub, though: you can't *not* use new interfaces. We are
> getting rid of the hard limit as the default and we really want people
> to rethink their configuration in the light of this. And even if you
> would just use the hard limit as before, there is no way we can leave
> the name 'memory.limit_in_bytes' when we have in fact 4 different
> limits.

We could theoretically keep a single limit and turn other limits into
watermarks. I am _not_ suggesting that now because I haven't thought
that through but I just think we should discuss other possible ways
before we go on.

> So I don't see any way how we can stay 100% backward compatible even
> with the most basic memcg functionality of setting an upper limit.
>
> And once you acknowledge that current users don't get around *some*
> adjustments, we really owe it to new users to present a clean and
> consistent interface.
>
> > I do realize that this is a _new_ API so we can do such radical changes
> > but I am also aware that some people have to maintain their stacks on
> > top of different kernels and it really sucks to maintain two different
> > configurations. In such a case it would be easier for those users to
> > stay with the legacy mode which is a fair option but I would much rather
> > see them move to the new API sooner rather than later.
>
> There is no way you can use the exact same scripts/configurations for
> the old and new API at the same time when the most basic way of using
> cgroups and memcg changed in v2.

OK, this is a fair point. Cgroups configuration is probably a bigger
problem. If a script rely on a tool/library to setup the hierarchy then
that tool/library can probably do the mapping from old names to new as
well otherwise it would need to be rewritten at least for cgroup part.

I have no idea how tools (e.g. libcgroup, libvirt and others) will adapt
to the new API and support of both APIs at the same time, though.

> ---
>
> > One of the concern was renaming knobs which represent the same
> > functionality as before. I have posted some concerns but haven't heard
> > back anything. This series doesn't give any rationale for renaming
> > either.
> > It is true we have a v2 but that doesn't necessarily mean we should put
> > everything upside down.
>
> I'm certainly not going out of my way to turn things upside down, but
> the old interface is outrageous. I'm sorry if you can't see that it
> badly needs to be cleaned up and fixed. This is the time to do that.

Of course I can see many problems. But please let's think twice and even
more times when doing radical changes. Many decisions sound reasonable
at the time but then they turn out bad much later.

> > > - memory.current: a read-only file that shows current memory usage.
> >
> > Even if we go with renaming existing knobs I really hate this name. The
> > old one was too long but this is not descriptive enough. Same applies to
> > max and high. I would expect at least limit in the name.
>
> Memory cgroups are about accounting and limiting memory usage. That's
> all they do. In that context, current, min, low, high, max seem
> perfectly descriptive to me, adding usage and limit seems redundant.

Getting naming right is always pain and different people will always
have different views. For example I really do not like memory.current
and would prefer memory.usage much more. I am not a native speaker but
`usage' sounds much less ambiguous to me. Whether shorter (without _limit
suffix) names for limits are better I don't know. They certainly seem
more descriptive with the suffix to me.

> We name syscalls creat() and open() and stat() because, while you have
> to look at the manpage once, they are easy to remember, easy to type,
> and they keep the code using them readable.
>
> memory.usage_in_bytes was the opposite approach: it tried to describe
> all there is to this knob in the name itself, assuming tab completion
> would help you type that long name. But we are more and more moving
> away from ad-hoc scripting of cgroups and I don't want to optimize for
> that anymore at the cost of really unwieldy identifiers.

I agree with you. _in_bytes is definitely excessive. It can be nicely
demonstrated by the fact that different units are allowed when setting
the value.

> Like with all user interfaces, we should provide a short and catchy
> name and then provide the details in the documentation.
>
> > > - memory.high: a file that allows setting a high limit on the memory
> > > usage. This is an elastic limit, which is enforced via direct
> > > reclaim, so allocators are throttled once it's reached, but it can
> > > be exceeded and does not trigger OOM kills. This should be a much
> > > more suitable default upper boundary for the majority of use cases
> > > that are better off with some elasticity than with sudden OOM kills.
> >
> > I also thought you wanted to have all the new limits in the single
> > series. My series is sitting idle until we finally come to conclusion
> > which is the first set of exposed knobs. So I do not understand why are
> > you coming with it right now.
>
> I still would like to, but I'm not sure we can get the guarantees
> working in time as unified hierarchy leaves its experimental status.

That shouldn't happen sooner than in next (maybe in 2) devel cycle(s)
(http://marc.info/?l=linux-kernel&m=140716172618366).

> And I'm fairly confident that we know how the upper limits should
> behave and that we are no longer going to change that, and that we
> have a decent understanding on how the guarantees are going to work.

I think we should first settle with the new knobs before we introduce
new ones. I understand you would like to have high limit as a preferable
way from the early beginning but I think that can wait while the new API
is still in devel mode.

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/