Re: [PATCH] mm/vmscan: respect cpuset policy during page demotion

From: Huang, Ying
Date: Mon Oct 31 2022 - 23:18:02 EST


"Huang, Ying" <ying.huang@xxxxxxxxx> writes:

> Michal Hocko <mhocko@xxxxxxxx> writes:
>
>> On Thu 27-10-22 15:39:00, Huang, Ying wrote:
>>> Michal Hocko <mhocko@xxxxxxxx> writes:
>>>
>>> > On Thu 27-10-22 14:47:22, Huang, Ying wrote:
>>> >> Michal Hocko <mhocko@xxxxxxxx> writes:
>>> > [...]
>>> >> > I can imagine workloads which wouldn't like to get their memory demoted
>>> >> > for some reason but wouldn't it be more practical to tell that
>>> >> > explicitly (e.g. via prctl) rather than configuring cpusets/memory
>>> >> > policies explicitly?
>>> >>
>>> >> If my understanding were correct, prctl() configures the process or
>>> >> thread.
>>> >
>>> > Not necessarily. There are properties which are per adddress space like
>>> > PR_[GS]ET_THP_DISABLE. This could be very similar.
>>> >
>>> >> How can we get process/thread configuration at demotion time?
>>> >
>>> > As already pointed out in previous emails. You could hook into
>>> > folio_check_references path, more specifically folio_referenced_one
>>> > where you have all that you need already - all vmas mapping the page and
>>> > then it is trivial to get the corresponding vm_mm. If at least one of
>>> > them has the flag set then the demotion is not allowed (essentially the
>>> > same model as VM_LOCKED).
>>>
>>> Got it! Thanks for detailed explanation.
>>>
>>> One bit may be not sufficient. For example, if we want to avoid or
>>> control cross-socket demotion and still allow demoting to slow memory
>>> nodes in local socket, we need to specify a node mask to exclude some
>>> NUMA nodes from demotion targets.
>>
>> Isn't this something to be configured on the demotion topology side? Or
>> do you expect there will be per process/address space usecases? I mean
>> different processes running on the same topology, one requesting local
>> demotion while other ok with the whole demotion topology?
>
> I think that it's possible for different processes have different
> requirements.
>
> - Some processes don't care about where the memory is placed, prefer
> local, then fall back to remote if no free space.
>
> - Some processes want to avoid cross-socket traffic, bind to nodes of
> local socket.
>
> - Some processes want to avoid to use slow memory, bind to fast memory
> node only.

Hi, Johannes, Wei, Jonathan, Yang, Aneesh,

We need your help. Do you or your organization have requirements to
restrict the page demotion target nodes? If so, can you share some
details of the requirements? For example, to avoid cross-socket
traffic, or to avoid using slow memory. And do you want to restrict
that with cpusets, memory policy, or some other interfaces.

Best Regards,
Huang, Ying