Re: [RFC v1 3/4] swiotlb: Allow dynamic allocation of bounce buffers

From: Petr Tesařík
Date: Fri Apr 07 2023 - 06:16:05 EST


On Fri, 7 Apr 2023 07:57:04 +0200
Christoph Hellwig <hch@xxxxxx> wrote:

> On Tue, Mar 28, 2023 at 02:43:03PM +0200, Petr Tesarik wrote:
> > Oh, wait! I can do at least something for devices which do not use
> > swiotlb at all.
> >
> > If a device does not use bounce buffers, it cannot pass an address
> > that belongs to the swiotlb. Consequently, the potentially
> > expensive check can be skipped. This avoids the dynamic lookup
> > penalty for devices which do not need the swiotlb.
> >
> > Note that the counter always remains zero if dma_io_tlb_mem is
> > NULL, so the NULL check is not required.
>
> Hmm, that's yet another atomic for each map/unmap, and bloats
> struct device.

I'm not sure how bad it is to bloat struct device. It is already quite
large, e.g. in my x86 build it is 768 bytes (exact size depends on
config options), and nobody seems to be concerned...

Regarding the atomic operations, I am currently testing a slightly
different approach, which merely sets a flag if there are any
dynamically allocated bounce buffers. The atomic check changes to
smp_load_acquire(), and the atomic inc/dec to smp_store_release()
only if the flag changes. That said, if I hammer this path with heavy
parallel I/O, I can still see some performance cost for devices that
use swiotlb, but at least devices that do not need such bounce buffers
seem to be unaffected then.

> (Btw, in case anyone is interested, we really need to get started
> on moving the dma fields out of struct device into a sub-struct
> only allocated for DMA capable busses)

I like this idea. In fact, my WIP topic branch now moves the swiotlb
fields into a separate struct, but I can surely go further and move all
DMA-related fields. I doubt it is worth to allocate it separately,
though. We are talking about replacing some 100 bytes (in the worst
case) with a pointer to a dynamically allocated struct, but the
dynamic allocator adds some overhead. I believe it pays off only if the
vast majority of struct device instances do not need these DMA fields,
but is that really the case?

Petr T