Re: [PATCH v9 3/3] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
From: Michal Hocko
Date: Mon Sep 21 2020 - 03:14:17 EST
On Mon 21-09-20 07:56:33, Christoph Hellwig wrote:
> On Mon, Aug 31, 2020 at 05:06:33PM -0700, Minchan Kim wrote:
> > There is usecase that System Management Software(SMS) want to give a
> > memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
> > case of Android, it is the ActivityManagerService.
> >
> > The information required to make the reclaim decision is not known to
> > the app. Instead, it is known to the centralized userspace
> > daemon(ActivityManagerService), and that daemon must be able to
> > initiate reclaim on its own without any app involvement.
> >
> > To solve the issue, this patch introduces a new syscall process_madvise(2).
> > It uses pidfd of an external process to give the hint. It also supports
> > vector address range because Android app has thousands of vmas due to
> > zygote so it's totally waste of CPU and power if we should call the
> > syscall one by one for each vma.(With testing 2000-vma syscall vs
> > 1-vector syscall, it showed 15% performance improvement. I think it
> > would be bigger in real practice because the testing ran very cache
> > friendly environment).
>
> I'm really not sure this syscall is a good idea. If you want central
> control you should implement an IPC mechanisms that allows your
> supervisor daemon to tell the application to perform the madvice
> instead of forcing the behavior on it.
Even though I am not entirely happy about the interface [1]. As it seems
I am in minority in my concern I backed off and decided to not block this
work because I do not see the problem with the functionality itself. And
I find it very useful for userspace driven memory management people are
asking for a long time.
This functionality shouldn't be much different from the standard memory
reclaim. It has some limitations (e.g. it can only handle mapped memory)
but allows to pro-actively swap out or reclaim disk based memory based
on a specific knowlege of the workload. Kernel is not able to do the
same.
[1] http://lkml.kernel.org/r/20200117115225.GV19428@xxxxxxxxxxxxxx
--
Michal Hocko
SUSE Labs