RE: [PATCH 1/1] swiotlb: extend buffer pre-padding to alloc_align_mask if necessary

From: Michael Kelley
Date: Fri Mar 15 2024 - 10:59:24 EST


From: Petr Tesařík <petr@xxxxxxxxxxx> Sent: Friday, March 15, 2024 5:26 AM
>
> On Fri, 15 Mar 2024 02:53:10 +0000
> Michael Kelley <mhklinux@xxxxxxxxxxx> wrote:
>
> > From: Petr Tesarik <petrtesarik@xxxxxxxxxxxxxxx> Sent: Tuesday, March 12, 2024 6:42 AM
> > >

[snip]

> > > @@ -1349,6 +1353,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > > return (phys_addr_t)DMA_MAPPING_ERROR;
> > > }
> > >
> > > + /*
> > > + * Calculate buffer pre-padding within the allocated space. Use it to
> > > + * preserve the low bits of the original address according to device's
> > > + * min_align_mask. Limit the padding to alloc_align_mask or slot size
> > > + * (whichever is bigger); higher bits of the original address are
> > > + * preserved by selecting a suitable IO TLB slot.
> > > + */
> > > + offset = orig_addr & dma_get_min_align_mask(dev) &
> > > + (alloc_align_mask | (IO_TLB_SIZE - 1));
> > > index = swiotlb_find_slots(dev, orig_addr,
> > > alloc_size + offset, alloc_align_mask, &pool);
> > > if (index == -1) {
> > > @@ -1364,6 +1377,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> > > * This is needed when we sync the memory. Then we sync the buffer if
> > > * needed.
> > > */
> > > + padding = 0;
> > > + while (offset >= IO_TLB_SIZE) {
> > > + pool->slots[index++].orig_addr = INVALID_PHYS_ADDR;
> > > + pool->slots[index].padding = ++padding;
> > > + offset -= IO_TLB_SIZE;
> > > + }
> >
> > Looping to fill in the padding slots seems unnecessary.
> > The orig_addr field should already be initialized to
> > INVALID_PHYS_ADDR, and the padding field should already
> > be initialized to zero.
>
> Ack.
>
> > The value of "padding" should be just
> > (offset / IO_TLB_SIZE), and it only needs to be stored in the
> > first non-padding slot where swiotlb_release_slots() will
> > find it.
>
> This is also right. I asked myself the question what should happen if
> somebody passes a padding slot's address to swiotlb_tbl_unmap_single().
> Of course, it's an invalid address, but as a proponent of defensive
> programming, I still asked what would be the best response? If I store
> padding in each slot, the whole block is released. If I store it only
> in the first non-padding slot, some slots may leak.
>
> On a second thought, the resulting SWIOTLB state is consistent either
> way, and we don't to care about leaking some slots if a driver is
> buggy. Maybe it's even better, because the leak will be noticed.
>
> In short, I agree, let's keep the code simple.
>

One other thought: Fundamentally, fixing the core problem
requires swiotlb_tbl_unmap_single() to have some information
it doesn't have in the current code. It needs to know the
number of padding slots, so that it can free them correctly.
Your solution is to store the # of padding slots in the
io_tlb_slot array.

Another approach would be to have callers pass the
alloc_align_mask as an argument to swiotlb_tbl_unmap_single().
It can then calculate the offset and the number of padding
slots just like swiotlb_tbl_map_single() does. Nothing
additional would need to be stored in the io_tlb_slot array.
The IOMMU code is the only caller than uses a non-zero
alloc_align_mask. From a quick look at that code, the
unmap path has the iova_mask() available, so that would
work. Other callers would pass zero, just like they do for
swiotlb_tbl_map_single().

I don't immediately have a strong opinion either way, but
it's something to think about a bit more.

Michael