Re: [PATCH] cleanup: Add 'struct dev' in the TTM layer to be passedin for DMA API calls.

From: Thomas Hellstrom
Date: Thu Mar 24 2011 - 03:53:19 EST


On 03/23/2011 03:52 PM, Konrad Rzeszutek Wilk wrote:
On Wed, Mar 23, 2011 at 02:17:18PM +0100, Thomas Hellstrom wrote:
On 03/23/2011 01:51 PM, Konrad Rzeszutek Wilk wrote:
I was thinking about this a bit after I found that the PowerPC requires
the 'struct dev'. But I got a question first, what do you with pages
that were allocated to a device that can do 64-bit DMA and then
move it to a device than can 32-bit DMA? Obviously the 32-bit card would
set the TTM_PAGE_FLAG_DMA32 flag, but the 64-bit would not. What is the
process then? Allocate a new page from the 32-bit device and then copy over the
page from the 64-bit TTM and put the 64-bit TTM page?
Yes, in certain situations we need to copy, and if it's necessary in
some cases to use coherent memory with a struct device assoicated
with it, I agree it may be reasonable to do a copy in that case as
well. I'm against, however, to make that the default case when
running on bare metal.
This situation could occur on native/baremetal. When you say 'default
case' you mean for every type of page without consulting whether it
had the TTM_PAGE_FLAG_DMA32?
No, Basically I mean a device that runs perfectly fine with
alloc_pages(DMA32) on bare metal shouldn't need to be using
dma_alloc_coherent() on bare metal, because that would mean we'd need
to take the copy path above.
I think we got the scenarios confused (or I did at least).
The scenario I used ("I was thinking.."), the 64-bit device would do
alloc_page(GFP_HIGHUSER) and if you were to move it to a 32-bit device
it would have to make a copy of the page as it could not reach the page
from GFP_HIGUSER.

The other scenario, which I think is what you are using, is that
we have a 32-bit device allocating a page, so TTM_PAGE_FLAG_DMA32 is set
and then we if we were to move it a 64-bit device it would need to
copied. But I don't think that is the case - the page would be
reachable by the 64-bit device. Help me out please if I am misunderstanding this.

Yes, this is completely correct.

Now, with a struct dev attached to each page in a 32-bit system (coherent memory)
we would need to always copy in the 32-bit case, since you can't hand over pages
belonging to other physical devices.
But on bare metal you don't need coherent memory, but in this case you
need to copy anyway becase you choose to allocate coherent memory.

I see a sort of a hackish way around these problems.

Let's say ttm were trying to detect a hypervisor dummy virtual device sitting on the pci bus. That device would perhaps provide pci information detailing what GFP masks needing to
allocate coherent memory. The TTM page pool could then grab that device and create a struct dev to use for allocating "anonymous" TTM BO memory.

Could that be a way forward? The struct dev would then be private to the page pool code, bare metal wouldn't need to allocate coherent memory, since the virtual device wouldn't be present. The page pool code would need to be updated to be able to cache also coherent pages.

Xen would need to create such a device in the guest with a suitable PCI ID that it would be explicitly willing to share with other hypervisor suppliers....

It's ugly, I know, but it might work...

Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/