Re: [PATCH] mm/vmscan: respect cpuset policy during page demotion

From: Feng Tang
Date: Mon Oct 31 2022 - 10:09:56 EST


On Mon, Oct 31, 2022 at 04:40:15PM +0800, Michal Hocko wrote:
> On Fri 28-10-22 07:22:27, Huang, Ying wrote:
> > Michal Hocko <mhocko@xxxxxxxx> writes:
> >
> > > On Thu 27-10-22 17:31:35, Huang, Ying wrote:
> [...]
> > >> I think that it's possible for different processes have different
> > >> requirements.
> > >>
> > >> - Some processes don't care about where the memory is placed, prefer
> > >> local, then fall back to remote if no free space.
> > >>
> > >> - Some processes want to avoid cross-socket traffic, bind to nodes of
> > >> local socket.
> > >>
> > >> - Some processes want to avoid to use slow memory, bind to fast memory
> > >> node only.
> > >
> > > Yes, I do understand that. Do you have any specific examples in mind?
> > > [...]
> >
> > Sorry, I don't have specific examples.
>
> OK, then let's stop any complicated solution right here then. Let's
> start simple with a per-mm flag to disable demotion of an address space.
> Should there ever be a real demand for a more fine grained solution
> let's go further but I do not think we want a half baked solution
> without real usecases.

Yes, the concern about the high cost for mempolicy from you and Yang is
valid.

How about the cpuset part? We've got bug reports from different channels
about using cpuset+docker to control meomry placement in memory tiering
system, leading to 2 commits solving them:

2685027fca38 ("cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in
cpuset_init_smp()")
https://lore.kernel.org/all/20220419020958.40419-1-feng.tang@xxxxxxxxx/

8ca1b5a49885 ("mm/page_alloc: detect allocation forbidden by cpuset and
bail out early")
https://lore.kernel.org/all/1632481657-68112-1-git-send-email-feng.tang@xxxxxxxxx/

>From these bug reports, I think it's reasonable to say there are quite
some real world users using cpuset+docker+memory-tiering-system. So
I plan to refine the original cpuset patch with some optimizations
discussed (like checking once for kswapd based shrink_folio_list()).

Thanks,
Feng

> --
> Michal Hocko
> SUSE Labs
>