Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap

From: Henry Huang
Date: Fri Dec 22 2023 - 10:41:05 EST


Thanks for replying.

On Fri, Dec 22, 2023 at 13:14 PM David Rientjes wrote:
> - is the lack of predeterministic charging a problem for you? Are you
> initially faulting it in a manner that charges it to the "right" memcg
> and the refault of it after periodic reclaim can causing the charge to
> appear "randomly," i.e. to whichever process happened to access it
> next?

Actually at begin, all pages got charged to cgroup A, but with memory pressure
or after proactive reclaim. Some pages would be dropped or swapped.
Task in cgroup B visit this shared memory before task in cgroup A,
would make these pages charged to cgroup B.

This is common in our enviorment.

> - are pages ever shared between different memcg hierarchies? You
> mentioned sharing between processes in A and A/B, but I'm wondering
> if there is sharing between two different memcg hierarchies where root
> is the only common ancestor?

Yes, there is a another really common case:
If docker graph driver is overlayfs, different docker containers use the
same image, or share same low layers, would share file cache of public bin or
lib(i.e libc.so).

> - do you anticipate a shorter scan period at some point? Proactively
> reclaiming all memory colder than one hour is a long time :) Are you
> concerned at all about the cost of doing your current idle bit
> harvesting approach becoming too expensive if you significantly reduce
> the scan period?

We don't want the owner of the application to feel a significant
performance downgrade when using swap. There is a high risk to reclaim pages
which idle age are less than 1 hour. We have internal test and
data analysis to support it.

We disabled global swappiness and memcg swapinness.
Only proactive reclaim can swap anon pages.

What's more, we see that mglru has a more efficient way to scan pte access bit.
We perferred to use mglru scan help us scan and select idle pages.

> - is proactive reclaim being driven by writing to memory.reclaim, by
> enforcing a smaller memory.high, or something else?

Because all pages info and idle age are stored in userspace, kernel can't get
these information directly. We have a private patch include a new reclaim interface
to support reclaim pages with specific pfns.

--
2.43.0