This has been discussed before, I can probably find it in the archives
if you want to go back and see it.
Yes. IIUC, we agreed to have independet kmem limit. I just want to think it
again because there are too many proposals and it seems I'm in confusion.
I'll try to cook a PoC.
But in a nutshell:seems interesting.
1) Supposing independent knob disappear (I will explain in item 2 why I
don't want it to), I don't thing a flag makes sense either. *If* we are
planning to enable/disable this, it might make more sense to put some
work on it, and allow particular slabs to be enabled/disabled by writing
to memory.kmem.slabinfo (-* would disable all, +* enable all, +kmalloc*
enable all kmalloc, etc).
Yes, I believe so. It is a big improvement over the current interfaceAll that said, while reading your message, thinking a bit, the following
crossed my mind:
- We can account the slabs to memcg->res normally, and just store the
information that this is kernel memory into a percpu counter, as
I proposed recently.
Ok, then user can see the amount of kernel memory.
- The knob goes away, and becomes implicit: if you ever write anything
to memory.kmem.limit_in_bytes, we transfer that memory to a separate
kmem res_counter, and proceed from there. We can keep accounting to
memcg->res anyway, just that kernel memory will now have a separate
limit.
Okay, then,
kmem_limit< memory.limit< memsw.limit
...seems reasonable to me.
This means, user can specify 'ratio' of kmem in memory.limit.
That's hard to do. The users of the cache have this information, the underlying slab/slub/slut code do not. We need to rely on the cache owner to provide this, and provide correctly. So the chances we'll have incorrect information here grows by quite a bit.
More consideration will be interesting.
- We can show the amount of reclaimable kmem by some means ?
- What happens when a new cgroup created ?
- Should we have 'ratio' interface in kernel level ?I personally don't like a ratio interface. I believe specifying "kmem should never be allowed to go over X bytes" is more than enough.
- What happens at task moving ?
- Should we allow per-slab accounting knob in /sys/kernel/slab/xxx ?
or somewhere ?
- Should we show per-memcg usage in /sys/kernel/slab/xxx ?I guess so.
- Should we have force_empty for kmem (as last resort) ?We do that when the cgroup is going away. From user action, I suspect the best we can do is call the shrinkers, and see if they get freed.
Yes. For the next round, we need to add some more detailed benchmarks.
With any implementation, my concern is
- overhead/performance.
- unreclaimable kmemThat's actually the reason behind all that!
- shared kmem between cgroups.
- With this scheme, it may not be necessary to ever have a fileGood.
memory.kmem.soft_limit_in_bytes. Reclaim is always part of the normal
memcg reclaim.
The outlined above would work for us, and make the whole scheme simpler,
I believe.
What do you think ?
It sounds interesting to me.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html