Re: [PATCH 00/10] HMM updates for 5.1

From: Jerome Glisse
Date: Tue Mar 19 2019 - 16:25:35 EST


On Tue, Mar 19, 2019 at 03:18:49PM -0400, Jerome Glisse wrote:
> On Tue, Mar 19, 2019 at 12:13:40PM -0700, Dan Williams wrote:
> > On Tue, Mar 19, 2019 at 12:05 PM Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Mar 19, 2019 at 11:42:00AM -0700, Dan Williams wrote:
> > > > On Tue, Mar 19, 2019 at 10:45 AM Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Mar 19, 2019 at 10:33:57AM -0700, Dan Williams wrote:
> > > > > > On Tue, Mar 19, 2019 at 10:19 AM Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Mar 19, 2019 at 10:12:49AM -0700, Andrew Morton wrote:
> > > > > > > > On Tue, 19 Mar 2019 12:58:02 -0400 Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
> > > > > > [..]
> > > > > > > > Also, the discussion regarding [07/10] is substantial and is ongoing so
> > > > > > > > please let's push along wth that.
> > > > > > >
> > > > > > > I can move it as last patch in the serie but it is needed for ODP RDMA
> > > > > > > convertion too. Otherwise i will just move that code into the ODP RDMA
> > > > > > > code and will have to move it again into HMM code once i am done with
> > > > > > > the nouveau changes and in the meantime i expect other driver will want
> > > > > > > to use this 2 helpers too.
> > > > > >
> > > > > > I still hold out hope that we can find a way to have productive
> > > > > > discussions about the implementation of this infrastructure.
> > > > > > Threatening to move the code elsewhere to bypass the feedback is not
> > > > > > productive.
> > > > >
> > > > > I am not threatening anything that code is in ODP _today_ with that
> > > > > patchset i was factering it out so that i could also use it in nouveau.
> > > > > nouveau is built in such way that right now i can not use it directly.
> > > > > But i wanted to factor out now in hope that i can get the nouveau
> > > > > changes in 5.2 and then convert nouveau in 5.3.
> > > > >
> > > > > So when i said that code will be in ODP it just means that instead of
> > > > > removing it from ODP i will keep it there and it will just delay more
> > > > > code sharing for everyone.
> > > >
> > > > The point I'm trying to make is that the code sharing for everyone is
> > > > moving the implementation closer to canonical kernel code and use
> > > > existing infrastructure. For example, I look at 'struct hmm_range' and
> > > > see nothing hmm specific in it. I think we can make that generic and
> > > > not build up more apis and data structures in the "hmm" namespace.
> > >
> > > Right now i am trying to unify driver for device that have can support
> > > the mmu notifier approach through HMM. Unify to a superset of driver
> > > that can not abide by mmu notifier is on my todo list like i said but
> > > it comes after. I do not want to make the big jump in just one go. So
> > > i doing thing under HMM and thus in HMM namespace, but once i tackle
> > > the larger set i will move to generic namespace what make sense.
> > >
> > > This exact approach did happen several time already in the kernel. In
> > > the GPU sub-system we did it several time. First do something for couple
> > > devices that are very similar then grow to a bigger set of devices and
> > > generalise along the way.
> > >
> > > So i do not see what is the problem of me repeating that same pattern
> > > here again. Do something for a smaller set before tackling it on for
> > > a bigger set.
> >
> > All of that is fine, but when I asked about the ultimate trajectory
> > that replaces hmm_range_dma_map() with an updated / HMM-aware GUP
> > implementation, the response was that hmm_range_dma_map() is here to
> > stay. The issue is not with forking off a small side effort, it's the
> > plan to absorb that capability into a common implementation across
> > non-HMM drivers where possible.
>
> hmm_range_dma_map() is a superset of gup_range_dma_map() because on
> top of gup_range_dma_map() the hmm version deals with mmu notifier.
>
> But everything that is not mmu notifier related can be share through
> gup_range_dma_map() so plan is to end up with:
> hmm_range_dma_map(hmm_struct) {
> hmm_mmu_notifier_specific_prep_step();
> gup_range_dma_map(hmm_struct->common_base_struct);
> hmm_mmu_notifier_specific_post_step();
> }
>
> ie share as much as possible. Does that not make sense ? To get
> there i will need to do non trivial addition to GUP and so i went
> first to get HMM bits working and then work on common gup API.
>

And more to the hmm_range struct:

struct hmm_range {
struct vm_area_struct *vma; // Common
struct list_head list; // HMM specific this is only useful
// to track valid range if a mmu
// notifier happens while we do
// lookup the CPU page table
unsigned long start; // Common
unsigned long end; // Common
uint64_t *pfns; // Common
const uint64_t *flags; // Some flags would be HMM specific
const uint64_t *values; // HMM specific
uint8_t pfn_shift; // Common
bool valid; // HMM specific
};

So it is not all common they are thing that just do not make sense out
side a HMM capable driver.

Cheers,
Jérôme