Re: [PATCH] x86: define arch_vm_get_page_prot to set _PAGE_IOMAPon VM_IO vmas
From: Konrad Rzeszutek Wilk
Date: Fri Oct 22 2010 - 14:04:09 EST
On Fri, Oct 22, 2010 at 09:44:08AM -0700, H. Peter Anvin wrote:
> On 10/22/2010 08:08 AM, Konrad Rzeszutek Wilk wrote:
> >>
> >> Okay, could you clarify this part a bit? Why does the kernel need to
> >> know the difference between "pseudo-physical" and "machine addresses" at
> >> all? If they overlap, there is a problem, and if they don't overlap, it
> >> will be a 1:1 mapping anyway...
> >
> > The flag (_PAGE_IOMAP) is used when we set the PTE so that the MFN value is
> > used instead of the PFN. We need that b/c when a driver does page_to_pfn()
> > it ends up using the PFN as bus address to write out registers data.
> >
> > Without this patch, the page->virt->PFN value is used and the PFN != to real MFN
> > so we end up writing in a memory address that the PCI device has no idea about.
> > By setting the PTE with the MFN, the virt->PFN gets the real MFN value.
> >
> > The drivers I am talking about are mostly, if not all, located in drivers/gpu
> > and it looks that we are missing two more patches to utilize the patch
> > that Jeremy posted.
> >
> > Please note that I am _not_ suggesting that the two patches
> > below should go out - I still need to post them on drm mailing list.
> >
>
> I'm still seriously confused. If I understand this correctly, we're
> talking about DMA addresses here (as opposed to PIO addresses, i.e.
> BARs), right?
Correct. The BARs are ok since they go through the ioremap.
>
> It's the bimodality that really bothers me. I understand of course that
> Xen imposes yet another address remapping layer, but I'm having a hard
> time understanding any conditions under with we would need that layer to
> go away, as long as DMA addresses are translated via the DMA APIs -- and
> if they aren't, then iommus will break, too.
That is it. They aren't using the DMA or PCI API completly(*). Try doing
'iommu=soft swiotlb=force' with your radeon card under baremetal
(I used an ATI ES1000). I think it will grind to halt during the writeback test.
(*): This was with 2.6.34, I haven't touched 2.6.36 and there was an drm/iomem rewrite
so it might be that this now working. The incomplete part of the graphics
drivers was that it would not do pci_dma_sync_*, so when the MFN was programmed in the
GTT/GART (check out radeon_gart_bind, the call to pci_map_page gets the bus address, also
known as MFN). So the GPU would now have a virt->MFN mapping. However, on the CPU side
when the driver writes a texture to virtual address, the mapping is virt->PFN.
So when we kick the GPU to do its magic, the VM on the graphics card would translate
the virtual address to the MFN, which did not have the data that was written by the
kernel to the PFN. In other words *PFN != *MFN, while we need *PFN == *MFN.
There are two ways of making this work:
1). PFN == MFN (this is what Jeremy's patch ends up doing) and under
baremetal it won't affect as baremetal doesn't care what the VM_IO flag
stands for.
2). Add a whole bunch of pci_dma_sync in the appropiate sections in the
graphic drivers.
I am not qualified to do 2) - that code scares me. Also 1) is the easier :-)
I am actually not sure how it works with AMD-Vi or Intel VT-d. I do remember
something about letting certain devices bypass the VT-d, and I think I saw
the nouveau making the DMAR throw a fit.
> As such, I don't grok this page flag and what it does, and why it's
> needed. I'm not saying it's *wrong*, I'm saying the design is opaque to
I hope my explanation cleared the confusion.
> me and I'm not sure it is the right solution.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/