Re: [PATCH resend] memcg: introduce per-memcg reclaim interface
From: Roman Gushchin
Date: Mon Apr 04 2022 - 18:09:02 EST
On Mon, Apr 04, 2022 at 10:44:04AM +0200, Michal Hocko wrote:
> On Fri 01-04-22 09:58:59, Roman Gushchin wrote:
> > On Fri, Apr 01, 2022 at 03:49:19PM +0200, Michal Hocko wrote:
> > > On Thu 31-03-22 10:25:23, Roman Gushchin wrote:
> > > > On Thu, Mar 31, 2022 at 08:41:51AM +0000, Yosry Ahmed wrote:
> > > [...]
> > > > > - A similar per-node interface can also be added to support proactive
> > > > > reclaim and reclaim-based demotion in systems without memcg.
> > > >
> > > > Maybe an option to specify a timeout? That might simplify the userspace part.
> > >
> > > What do you mean by timeout here? Isn't
> > > timeout $N echo $RECLAIM > ....
> > >
> > > enough?
> >
> > It's nice and simple when it's a bash script, but when it's a complex
> > application trying to do the same, it quickly becomes less simple and
> > likely will require a dedicated thread to avoid blocking the main app
> > for too long and a mechanism to unblock it by timer/when the need arises.
> >
> > In my experience using correctly such semi-blocking interfaces (semi- because
> > it's not clearly defined how much time the syscall can take and whether it
> > makes sense to wait longer) is tricky.
>
> We have the same approach to setting other limits which need to perform
> the reclaim. Have we ever hit that as a limitation that would make
> userspace unnecessarily too complex?
The difference here is that some limits are most likely set once and
never adjusted, e.g. memory.max or memory.low.
I do definitely remember some issues around memory.high, but as I recall,
we've fixed them on the kernel side. We've even had a private memory.high.tmp
interface with a value and a timeout, which later was replaced with
a memory.reclaim interface similar to what we discuss here.
But with memory.high we set the limit first, so if a user tries to reclaim
a lot of hot memory, it will soon put all processes in the cgroup into
the sleep/direct reclaim. So it's not expected to block for too long.
In general it all comes to the question how hard the kernel should try to
reclaim the memory before giving up. The userspace might have different
needs in different cases. But if the interface is defined very vaguely like
it tries for an undefined amount of time and then gives up, it's hard to
use it in a predictive manner.
Thanks!