Re: [RFC 1/7] mm: introduce MADV_COOL

From: Minchan Kim
Date: Tue May 21 2019 - 05:14:13 EST

On Tue, May 21, 2019 at 08:04:43AM +0200, Michal Hocko wrote:
> On Tue 21-05-19 07:54:19, Minchan Kim wrote:
> > On Mon, May 20, 2019 at 10:16:21AM +0200, Michal Hocko wrote:
> [...]
> > > > Internally, it works via deactivating memory from active list to
> > > > inactive's head so when the memory pressure happens, they will be
> > > > reclaimed earlier than other active pages unless there is no
> > > > access until the time.
> > >
> > > Could you elaborate about the decision to move to the head rather than
> > > tail? What should happen to inactive pages? Should we move them to the
> > > tail? Your implementation seems to ignore those completely. Why?
> >
> > Normally, inactive LRU could have used-once pages without any mapping
> > to user's address space. Such pages would be better candicate to
> > reclaim when the memory pressure happens. With deactivating only
> > active LRU pages of the process to the head of inactive LRU, we will
> > keep them in RAM longer than used-once pages and could have more chance
> > to be activated once the process is resumed.
> You are making some assumptions here. You have an explicit call what is
> cold now you are assuming something is even colder. Is this assumption a
> general enough to make people depend on it? Not that we wouldn't be able
> to change to logic later but that will always be risky - especially in
> the area when somebody want to make a user space driven memory
> management.

Think about MADV_FREE. It moves those pages into inactive file LRU's head.
See the get_scan_count which makes forceful scanning of inactive file LRU
if it has enough size based on the memory pressure.
The reason is it's likely to have used-once pages in inactive file LRU,
generally. Those pages has been top-priority candidate to be reclaimed
for a long time.

Only parts I am aware of moving pages into tail of inactive LRU are places
writeback is done for pages VM already decide to reclaim by LRU aging or
destructive operation like invalidating but couldn't completed. It's
really strong hints with no doubt.

> > > What should happen for shared pages? In other words do we want to allow
> > > less privileged process to control evicting of shared pages with a more
> > > privileged one? E.g. think of all sorts of side channel attacks. Maybe
> > > we want to do the same thing as for mincore where write access is
> > > required.
> >
> > It doesn't work with shared pages(ie, page_mapcount > 1). I will add it
> > in the description.
> OK, this is good for the starter. It makes the implementation simpler
> and we can add shared mappings coverage later.
> Although I would argue that touching only writeable mappings should be
> reasonably safe.
> --
> Michal Hocko
> SUSE Labs