Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

From: Knut Omang
Date: Mon Apr 24 2017 - 03:38:10 EST

Next message: Pavel Machek: "Re: [PATCH] led: ledtrig-transient: replace timer_list with hrtimer"
Previous message: Mike Galbraith: "Re: TREE_SRCU slows hotplug by factor ~16"
Next in thread: Logan Gunthorpe: "Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 2017-04-17 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2017-04-16 at 10:34 -0600, Logan Gunthorpe wrote:
> >Â
> > On 16/04/17 09:53 AM, Dan Williams wrote:
> > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve
> > > context about the physical address in question. I'm thinking you can
> > > hang bus address translation data off of that structure. This seems
> > > vaguely similar to what HMM is doing.
> >Â
> > Thanks! I didn't realize you had the infrastructure to look up a device
> > from a pfn/page. That would really come in handy for us.
>
> It does indeed. I won't be able to play with that much for a few weeks
> (see my other email) so if you're going to tackle this while I'm away,
> can you work with Jerome to make sure you don't conflict with HMM ?
>
> I really want a way for HMM to be able to layout struct pages over the
> GPU BARs rather than in "allocated free space" for the case where the
> BAR is big enough to cover all of the GPU memory.
>
> In general, I'd like a simple & generic way for any driver to ask the
> core to layout DMA'ble struct pages over BAR space. I an not convinced
> this requires a "p2mem device" to be created on top of this though but
> that's a different discussion.
>
> Of course the actual ability to perform the DMA mapping will be subject
> to various restrictions that will have to be implemented in the actual
> "dma_ops override" backend. We can have generic code to handle the case
> where devices reside on the same domain, which can deal with switch
> configuration etc... we will need to have iommu specific code to handle
> the case going through the fabric.Â
>
> Virtualization is a separate can of worms due to how qemu completely
> fakes the MMIO space, we can look into that later.

My first reflex when reading this thread was to think that this whole domain
lends it self excellently to testing via Qemu. Could it be that doing this inÂ
the opposite direction might be a safer approach in the long run even thoughÂ
(significant) more work up-front?

Eg. start by fixing/providing/documenting suitable model(s)Â
for testing this in Qemu, then implement the patch set basedÂ
on those models?

Thanks,
Knut

>
> Cheers,
> Ben.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info atÂÂhttp://vger.kernel.org/majordomo-info.html

Next message: Pavel Machek: "Re: [PATCH] led: ledtrig-transient: replace timer_list with hrtimer"
Previous message: Mike Galbraith: "Re: TREE_SRCU slows hotplug by factor ~16"
Next in thread: Logan Gunthorpe: "Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]