. snip ..
Excellent.Actually, I don't think it's a big problem. TTM allows a small2) What about accounting? In a *non-Xen* environment, will theThe code in the IOMMUs end up calling __get_free_pages, which ends up
number of coherent pages be less than the number of DMA32 pages, or
will dma_alloc_coherent just translate into a alloc_page(GFP_DMA32)?
in alloc_pages. So the call doe ends up in alloc_page(flags).
native SWIOTLB (so no IOMMU): GFP_DMA32
GART (AMD's old IOMMU): GFP_DMA32:
For the hardware IOMMUs:
AMD VI: if it is in Passthrough mode, it calls it with GFP_DMA32.
If it is in DMA translation mode (normal mode) it allocates a page
with GFP_ZERO | ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32) and immediately
translates the bus address.
The flags change a bit:
VT-d: if there is no identity mapping, nor the PCI device is one of the special ones
(GFX, Azalia), then it will pass it with GFP_DMA32.
If it is in identity mapping state, and the device is a GFX or Azalia sound
card, then it will ~(__GFP_DMA | GFP_DMA32) and immediately translate
the buss address.
However, the interesting thing is that I've passed in the 'NULL' as
the struct device (not intentionally - did not want to add more changes
to the API) so all of the IOMMUs end up doing GFP_DMA32.
But it does mess up the accounting with the AMD-VI and VT-D as they strip
of the __GFP_DMA32 flag off. That is a big problem, I presume?
discrepancy between allocated pages and accounted pages to be able
to account on actual allocation result. IIRC, This means that a
DMA32 page will always be accounted as such, or at least we can make
it behave that way. As long as the device can always handle the
page, we should be fine.
Ok, since GFP_KERNEL does not contain the GFP_DMA32 flag thenNot really. The really dangerous situation is if TTM is allowed to3) Same as above, but in a Xen environment, what will stop multipleSay I pass in four ATI Radeon cards (wherein each is a 32-bit card) to
guests to exhaust the coherent pages? It seems that the TTM
accounting mechanisms will no longer be valid unless the number of
available coherent pages are split across the guests?
four guests. Lets also assume that we are doing heavy operations in all
of the guests. Since there are no communication between each TTM
accounting in each guest you could end up eating all of the 4GB physical
memory that is available to each guest. It could end up that the first
guess gets a lion share of the 4GB memory, while the other ones are
less so.
And if one was to do that on baremetal, with four ATI Radeon cards, the
TTM accounting mechanism would realize it is nearing the watermark
and do.. something, right? What would it do actually?
I think the error path would be the same in both cases?
exhaust all GFP_KERNEL memory. Then any application or kernel task
this should be OK?
What *might* be possible, however, is that the GFP_KERNEL memory onHmm. I think I am missing something here. The GFP_KERNEL is any memory
the host gets exhausted due to extensive TTM allocations in the
guest, but I guess that's a problem for XEN to resolve, not TTM.
and the GFP_DMA32 is memory from the ZONE_DMA32. When we do start
using the PCI-API, what happens underneath (so under Linux) is that
"real PFNs" (Machine Frame Numbers) which are above the 0x100000 mark
get swizzled in for the guest's PFNs (this is for the PCI devices
that have the dma_mask set to 32bit). However, that is a Xen MMU
accounting issue.
/ThomasIs there a good test-case for this?
*) I think gem's flink still is vulnerable to this, though, so it