Noo.. It does, but the normal assumption of 'phys_to_virt' ==
'phys_to_bus' is
not valid anymore. Think of a buffer (swiotlb) which has a pool
of pages and when a PCI device wants a page, it hands one out. It also has
other functionality such as 'mapping' of an already allocated page. If the
PCI device asks the IOMMU (swiotlb) to map a page (and if you have
'swiotlb=force'
the page provided has been allocated above 4GB and the device can only handle
up to 32-bit),
then swiotlb gives out a page from its own pool. You now have
two addresses: the one from the PCI pool (swiotlb) and the one you already
allocated.
You are suppose to program your PCI card to read/write data to the
page provided from the IOMMU (so the swiotlb), which means that it won't
write to the page you had allocated. Hence there are a calls, such as
'sync_page'..
which will copy the contents from the swiotlb page to the one you had
allocated
(or vice-versa). This is called 'bounce buffer'.
The radeon (and nouveau) don't have the code to call the 'sync_page', and
you wouldn't really want to do so - as it slows down the performance of the
machine. There exists another mechanism which is to allocate the pages
at the start, and not do mapping later one.