RE: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

From: Michael Kelley
Date: Mon Apr 01 2024 - 11:05:28 EST


From: Dominique Martinet <dominique.martinet@xxxxxxxxxxxxxxxxx> Sent: Monday, April 1, 2024 12:56 AM
>
> Michael Kelley wrote on Sat, Mar 30, 2024 at 04:16:30AM +0000:
> > From: Dominique Martinet <dominique.martinet@xxxxxxxxxxxxxxxxx>
> Sent: Friday, March 29, 2024 7:56 PM
> > > There are two things I don't understand here:
> > > 1/ Why orig_addr would come from slot[1] ?
> > >
> > > We have index = (tlb_addr - mem->start) >> IO_TLB_SHIFT,
> > > so index = (33 - 7) >> 5 = 26 >> 5 = 0
> > >
> > > As such, orig_addr = mem->slots[0].orig_addr and we'd need the offset to
> > > be 30, not -2 ?
> >
> > mem->start is the physical address of the global pool of
> > memory allocated for swiotlb buffers.
>
> Argh.
> Okay, that clears up a misunderstanding I've had since day one... I
> should have done a little more reading there.
> I've re-checked now and indeed mem->start comes from the pool init, and
> corresponds to the reserved memory base.
> (I'm not actually 100% sure reserved memory has to be aligned, but at the
> very least I've never seen any that isn't on my hardware so I'll pretend
> it must be without checking)

That memory is allocated in swiotlb_memblock_alloc() with a
call to memblock_alloc(). The second parameter passed to
memblock_alloc() is PAGE_SIZE, which is the required alignment.
There's a complicated dependency that this value must be >=
the largest value that's ever used for the min alignment. Only
a few drivers specify a min alignment, and the largest value used
is 4K, so that dependency is met by passing PAGE_SIZE.

>
> That makes much more sense with your fix, I agree offset must be
> negative in that case, and it'll work out.
>
> > > Well, either work - if we fix index to point to the next slot in the
> > > negative case that's also acceptable if we're sure it's valid, but I'm
> > > worried it might not be in cases there was only one slot e.g. mapping
> > > [7; 34] and calling with 33 size 2 would try to access slot 1 with a
> > > negative offset in your example, but slot[0] is the last valid slot.
> >
> > Right, but there wouldn't be one slot mapping [7; 34] if the
> > alignment rules are followed when the global swiotlb memory
> > pool is originally created. The low order IO_TLB_SHIFT bits
> > of slot physical addresses must be zero for the arithmetic
> > using shifts to work, so [7; 34] will cross a slot boundary and
> > two slots are needed.
>
> Yes, since the mem->start/slots belongs to the pool and not the mapping
> this didn't make sense either; there's no problem here.
>
> > > 2/ Why is orig_addr 37 the correct address to use for memcpy, and not
> > > 33? I'd think it's off by a "minimum alignment page", for me this
> > > computation only works if the dma_get_min_align size is bigger than io
> > > tlb size.
> >
> > The swiotlb mapping operation establishes a pair-wise mapping between
> > an orig_addr and tlb_addr, with the mapping extending for a specified
> > number of bytes. Your example started with orig_addr = 7, and I
> > posited that the mapping extends for 40 bytes.
>
> Sure.
>
> > I further posited that the tlb_addr returned by
> > swiotlb_tbl_map_single() would be 3 to meet the min alignment
> > requirement (which again only works if mem->start is 0).
>
> Okay that's where I'm lost.
> 1/ I agree that swiotlb_bounce() called from swiotlb_tbl_map_single()
> cannot be called with a tlb_addr past a single segment (which I'm not
> sure is acceptable in itself, taking the real value of 2KB for io tlb
> "pages", if the device requires 512 bytes alignment you won't be able to
> access [512-2048[ ?)

The swiotlb_bounce() call within swiotlb_tbl_map_single() is always
made with the tlb_addr that swiotlb_tbl_map_single() will return to
the calling driver, and for a size that is the mapping size passed as
a parameter to swiotlb_tbl_map_single(). In other words, that
swiotlb_bounce() call is never a partial sync -- it always sync's the
entire mapped area. And there's no problem with accessing the
entire mapped area as swiotlb_bounce() can't validate the
tlb_addr that's passed in. It _can_ validate the size, but the size
passed in is always valid if it is bouncing exactly the entire mapped
area.

> 2/ swiotlb_bounce() can be called from swiotlb_sync_single_for_device()
> or swiotlb_sync_single_for_cpu() with no alignment check on tlb_addr,
> we're just trusting that they only ever pass address within the same
> constraint ?

Yes, this is the case that is at issue in this entire discussion. When
the calling device driver is doing a sync of only part of a mapped
area (i.e., a "partial sync"), the tlb_addr passed to swiotlb_bounce()
might not be the tlb_addr returned by swiotlb_tbl_map_single(). It
might be that tlb_addr plus some increment. And indeed,
there's no way for swiotlb_bounce() to validate that the tlb_addr
plus the increment is really within the swiotlb slots allocated for
the mapping. So the swiotlb code is trusting that the calling
device driver isn't doing something bogus.

But in the bigger picture, swiotlb_tbl_map_single() returns a
tlb_addr to the device driver. We already assume that the device
driver will pass that value to the actual device for it to do the
DMA. If the device driver is badly behaved and passes a bad
tlb_addr to the device for the DMA, it can already wreak havoc.
So not being able to validate the tlb_addr passed to
swiotlb_bounce() doesn't really make things any worse. Drivers
must be well-behaved.

>
> If you assume tlb addr can only go from the start of a slot to as far as
> the min alignment allows then I agree there's no more problem, but I
> don't understand where that comes from.

No, the code doesn't assume that. The tlb_addr and size passed to
swiotlb_bounce() via one of the "swiotlb_sync_*()" calls can
legitimately identify any subset of the originally mapped area. If
the originally mapped area occupies 10 slots in the swiotlb, then
swiotlb_bounce() can sync any subset range across those 10 slots
(starting after the minimum alignment "padding" in the 1st slot).
For example, the tlb_addr might point into the 9th of the 10 slots,
as long as that tlb_addr plus the size doesn't extend past the end
of the originally mapped area.

>
>
> Either way, that's not a new problem, and the old checks aren't making
> this any better so as far as I'm concerned this patch is Progress:
> Reviewed-by: Dominique Martinet <dominique.martinet@xxxxxxxxxxxxxxxxx>
>

Thanks.

>
> Thanks for taking me by the hand here; if you want to keep discussing
> this I'll be happy to give it a little bit more time but I might be a
> little bit slower.

No problem.

> --
> Dominique