Re: Suggestions for how to remove bus_to_virt()

From: Roland Dreier
Date: Wed Jul 12 2006 - 20:17:12 EST


> One solution is to change the IB device driver interface so that
> kernel virtual addresses are passed to the IB device driver and
> the device driver is responsible for calling dma_map_single(), etc.
> I believe this will be unacceptable to the OpenFabrics community

Actually it's worse than unacceptable -- I don't see how this can work
at all. The problem is that addresses are not just passed directly to
the local HCA; they also might be embedded in protocol messages that
are sent to a remote HCA and then used by the remote HCA to initiate
RDMA.

For example, the SRP driver uses ib_get_dma_mr() to get an R_Key,
which it then sends to the target along with a DMA address. The
target uses that R_Key/address to RDMA data directly to or from the
host. There's no good way for the low-level driver to handle the DMA
mapping, since the address is embedded in a protocol message that the
low-level driver knows nothing about.

> Another solution is to change the IB device driver interface to add
> a function which tells the caller what type of addresses the device
> expects. Kernel modules would then be required to pass either a
> dma_map_xxx() address or a kernel virtual address based on the
> driver's preference.
> The current set of IB consumers either start with kmalloc/vmalloc
> memory (such as the MAD layer) or a list of physical pages
> (such as ISER and SRP). The current code could therefore be
> fairly easily changed except for ISER/SRP when a struct page
> doesn't have a direct kernel address (high pages) and would
> need to call kmap()/kunmap() in that case.

I have a few problems with this: first, it's unfortunate that every
consumer needs two code paths to handle the two possibilities; second,
I don't see how it handles the case of SRP's use of the
ib_get_dma_mr() R_Key as above anyway; third, expecting consumers to
kmap pages for a long time across work request execution is a bad
idea.

Maybe the least bad solution would be to add rdma_XXX wrappers around
the dma mapping functions that RDMA consumers use; then most low-level
drivers could just pass them through to the DMA mapping API, while the
ipath driver could handle things itself.

The problem with that is that it ends up wrapping a huge API -- for
example, you need both dma_map_single and dma_map_sg at least, plus
someone might want to use dma_alloc_coherent memory, not to mention
the dma_pool stuff, etc.

A cleaner solution would be to make the dma_ API really use the device
it's passed anyway, and allow drivers to override the standard PCI
stuff nicely. But that would be major surgery, I guess.

(BTW, vmalloc memory should not be used for DMA, since it's not
guaranteed to be DMA-able -- so anyone doing that is just wrong)

- R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/