Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5

From: Bob Liu
Date: Mon Sep 11 2017 - 21:12:34 EST


On 2017/9/12 7:36, Jerome Glisse wrote:
> On Sun, Sep 10, 2017 at 07:22:58AM +0800, Bob Liu wrote:
>> On Wed, Sep 6, 2017 at 3:36 AM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
>>> On Thu, Jul 20, 2017 at 08:48:20PM -0700, Dan Williams wrote:
>>>> On Thu, Jul 20, 2017 at 6:41 PM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
>>>>> On Fri, Jul 21, 2017 at 09:15:29AM +0800, Bob Liu wrote:
>>>>>> On 2017/7/20 23:03, Jerome Glisse wrote:
>>>>>>> On Wed, Jul 19, 2017 at 05:09:04PM +0800, Bob Liu wrote:
>>>>>>>> On 2017/7/19 10:25, Jerome Glisse wrote:
>>>>>>>>> On Wed, Jul 19, 2017 at 09:46:10AM +0800, Bob Liu wrote:
>>>>>>>>>> On 2017/7/18 23:38, Jerome Glisse wrote:
>>>>>>>>>>> On Tue, Jul 18, 2017 at 11:26:51AM +0800, Bob Liu wrote:
>>>>>>>>>>>> On 2017/7/14 5:15, Jérôme Glisse wrote:
>>>
>>> [...]
>>>
>>>>>>> Second device driver are not integrated that closely within mm and the
>>>>>>> scheduler kernel code to allow to efficiently plug in device access
>>>>>>> notification to page (ie to update struct page so that numa worker
>>>>>>> thread can migrate memory base on accurate informations).
>>>>>>>
>>>>>>> Third it can be hard to decide who win between CPU and device access
>>>>>>> when it comes to updating thing like last CPU id.
>>>>>>>
>>>>>>> Fourth there is no such thing like device id ie equivalent of CPU id.
>>>>>>> If we were to add something the CPU id field in flags of struct page
>>>>>>> would not be big enough so this can have repercusion on struct page
>>>>>>> size. This is not an easy sell.
>>>>>>>
>>>>>>> They are other issues i can't think of right now. I think for now it
>>>>>>
>>>>>> My opinion is most of the issues are the same no matter use CDM or HMM-CDM.
>>>>>> I just care about a more complete solution no matter CDM,HMM-CDM or other ways.
>>>>>> HMM or HMM-CDM depends on device driver, but haven't see a public/full driver to
>>>>>> demonstrate the whole solution works fine.
>>>>>
>>>>> I am working with NVidia close source driver team to make sure that it works
>>>>> well for them. I am also working on nouveau open source driver for same NVidia
>>>>> hardware thought it will be of less use as what is missing there is a solid
>>>>> open source userspace to leverage this. Nonetheless open source driver are in
>>>>> the work.
>>>>
>>>> Can you point to the nouveau patches? I still find these HMM patches
>>>> un-reviewable without an upstream consumer.
>>>
>>> So i pushed a branch with WIP for nouveau to use HMM:
>>>
>>> https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-nouveau
>>>
>>
>> Nice to see that.
>> Btw, do you have any plan for a CDM-HMM driver? CPU can write to
>> Device memory directly without extra copy.
>
> Yes nouveau CDM support on PPC (which is the only CDM platform commercialy
> available today) is on the TODO list. Note that the driver changes for CDM
> are minimal (probably less than 100 lines of code). From the driver point
> of view this is memory and it doesn't matter if it is CDM or not.
>
> The real burden is on the application developpers who need to update their
> code to leverage this.
>

Why it's not transparent to application?
Application just use system malloc() and don't care whether the data is copied or not.

>
> Also as a data point you want to avoid CPU access to CDM device memory as
> much as possible. The overhead for single cache line access are high (this
> is PCIE or derivative protocol and it is a packet protocol).
>

Thank you for the hint, we are going to follow cdm-hmm since HMM already merged into upstream.

--
Thanks,
Bob