Re: [RFC PATCH] memcg: export knobs for the defaul cgroup hierarchy

From: Michal Hocko
Date: Mon Jul 21 2014 - 05:07:33 EST


On Fri 18-07-14 19:44:43, Vladimir Davydov wrote:
> On Wed, Jul 16, 2014 at 11:58:14AM -0400, Johannes Weiner wrote:
> > On Wed, Jul 16, 2014 at 04:39:38PM +0200, Michal Hocko wrote:
> > > +#ifdef CONFIG_MEMCG_KMEM
> > > + {
> > > + .name = "kmem.limit_in_bytes",
> > > + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
> > > + .write = mem_cgroup_write,
> > > + .read_u64 = mem_cgroup_read_u64,
> > > + },
> >
> > Does it really make sense to have a separate limit for kmem only?
> > IIRC, the reason we introduced this was that this memory is not
> > reclaimable and so we need to limit it.
> >
> > But the opposite effect happened: because it's not reclaimable, the
> > separate kmem limit is actually unusable for any values smaller than
> > the overall memory limit: because there is no reclaim mechanism for
> > that limit, once you hit it, it's over, there is nothing you can do
> > anymore. The problem isn't so much unreclaimable memory, the problem
> > is unreclaimable limits.
> >
> > If the global case produces memory pressure through kernel memory
> > allocations, we reclaim page cache, anonymous pages, inodes, dentries
> > etc. I think the same should happen for kmem: kmem should just be
> > accounted and limited in the overall memory limit of a group, and when
> > pressure arises, we go after anything that's reclaimable.
>
> Personally, I don't think there's much sense in having a separate knob
> for kmem limit either. Until we have a user with a sane use case for it,
> let's not propagate it to the new interface.

What about fork-bomb forks protection? I thought that was the primary usecase
for K < U? Or how can we handle that use case with a single limit? A
special gfp flag to not trigger OOM path when called from some kmem
charge paths?

What about task_count or what was the name of the controller which was
dropped and suggested to be replaced by kmem accounting? I can imagine
that to be implemented by a separate K limit which would be roughtly
stack_size * task_count + pillow for slab.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/