Re: [RFC PROPOSAL] memcg: per-memcg user space reclaim interface
From: Shakeel Butt
Date: Tue Jul 07 2020 - 13:03:06 EST
On Tue, Jul 7, 2020 at 5:14 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Fri 03-07-20 07:23:14, Shakeel Butt wrote:
> > On Thu, Jul 2, 2020 at 11:35 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > > On Thu 02-07-20 08:22:22, Shakeel Butt wrote:
> > > [...]
> > > > Interface options:
> > > > ------------------
> > > >
> > > > 1) memcg interface e.g. 'echo 10M > memory.reclaim'
> > > >
> > > > + simple
> > > > + can be extended to target specific type of memory (anon, file, kmem).
> > > > - most probably restricted to cgroup v2.
> > > >
> > > > 2) fadvise(PAGEOUT) on cgroup_dir_fd
> > > >
> > > > + more general and applicable to other FSes (actually we are using
> > > > something similar for tmpfs).
> > > > + can be extended in future to just age the LRUs instead of reclaim or
> > > > some new use cases.
> > >
> > > Could you explain why memory.high as an interface to trigger pro-active
> > > memory reclaim is not sufficient. Also memory.low limit to protect
> > > latency sensitve workloads?
> >
> > Yes, we can use memory.high to trigger [proactive] reclaim in a memcg
> > but note that it can also introduce stalls in the application running
> > in that memcg. Let's suppose the memory.current of a memcg is 100MiB
> > and we want to reclaim 20MiB from it, we can set the memory.high to
> > 80MiB but any allocation attempt from the application running in that
> > memcg can get stalled/throttled. I want the functionality of the
> > reclaim without potential stalls.
>
> It would be great if the proposal mention this limitation.
>
Will do in the next version.
> > The memory.min is for protection against the global reclaim and is
> > unrelated to this discussion.
>
> Well, I was talkingg about memory.low. It is not meant only to protect
> from the global reclaim. It can be used for balancing memory reclaim
> from _any_ external memory pressure source. So it is somehow related to
> the usecase you have mentioned.
>
For the uswapd use-case, I am not concerned about the external memory
pressure source but the application hitting its own memory.high limit
and getting throttled.
> What you consider a latency sensitive workload could be protected from
> directly induced reclaim latencies. You could use low events to learn
> about the external memory pressure and update your protection to allow
> for some reclaim. I do understand that this wouldn't solve your problem
> who gets reclaimed and maybe that is the crux on why it is not
> applicable but that should really be mentioned explicitly.
>
The main aim for the proactive reclaim is to not cause an external
memory pressure. The low events can be another source of information
to tell the system level situation to the 'Memory Overcommit
Controller'. So, I see the low events as complementary, not the
replacement for the reclaim interface.
BTW by "low events from external memory pressure" am I correct in
understanding that you meant an unrelated job reclaiming and
triggering low events on a job of interest. Or do you mean to
partition a job into sub-jobs and then use the low events between
these sub-jobs somehow?