Re: [PATCH -V8 02/10] mm/numa: automatically generate node migration order

From: Zi Yan
Date: Tue Jun 22 2021 - 08:48:25 EST


On 22 Jun 2021, at 8:06, Dave Hansen wrote:

> Yan, your reply came through in HTML. It doesn't bother me too much,
> but you'll find your replies dropped by LKML and other mailing lists
> if you do this.

Apologies. I used the wrong text mode. Thanks for letting me know.

>
> On 6/21/21 7:50 AM, Zi Yan wrote:
>> Is there a plan of allowing user to change where the migration path
>> starts? Or maybe one step further providing an interface to allow
>> user to specify the demotion path. Something like
>> /sys/devices/system/node/node*/node_demotion.
>
> We actually had this in an earlier series. I pulled it out because we
> don't really *need* this ABI at the moment. But, I totally agree that
> it would be handy for many things, including any non-obvious topology
> where the built-in ordering isn't optimal.
>
>> I don't think that's necessary at least for now. Do you know any
>> real world use case for this?
>>
>> In our P9+volta system, GPU memory is exposed as a NUMA node. For
>> the GPU workloads with data size greater than GPU memory size, it
>> will be very helpful to allow pages in GPU memory to be
>> migrated/demoted to CPU memory. With your current assumption, GPU
>> memory -> CPU memory demotion seems not possible, right? This
>> should also apply to any system with a device memory exposed as a
>> NUMA node and workloads running on the device and using CPU memory
>> as a lower tier memory than the device memory.
>
> Yes, with the current ordering, CPU memory would be demoted to the
> GPU, not the other way around. The right way to fix this (on ACPI
> platforms at least) is probably to use the HMAT table and build the
> demotion based on any memory targets rather than just CPUs.
>
> That would be a great future enhancement to all of this. But, because
> not all systems have HMATs, we also need something more basic, which
> is what is in this series.

This information is very helpful. I agree that reading HMAT table is
the right way. I will look into it. Thanks!



Best Regards,
Yan, Zi

Attachment: signature.asc
Description: OpenPGP digital signature