Re: [PATCH 1/1] swiotlb: Fix swiotlb_bounce() to do partial sync's correctly

From: Dominique Martinet
Date: Wed Mar 27 2024 - 02:13:32 EST


H Michael,

mhkelley58@xxxxxxxxx wrote on Tue, Mar 26, 2024 at 08:45:48PM -0700:
> From: Michael Kelley <mhklinux@xxxxxxxxxxx>
>
> In current code, swiotlb_bounce() may do partial sync's correctly in
> some circumstances, but may incorrectly fail in other circumstances.
> The failure cases require both of these to be true:
>
> 1) swiotlb_align_offset() returns a non-zero "offset" value
> 2) the tlb_addr of the partial sync area points into the first
> "offset" bytes of the _second_ or subsequent swiotlb slot allocated
> for the mapping
>
> Code added in commit 868c9ddc182b ("swiotlb: add overflow checks
> to swiotlb_bounce") attempts to WARN on the invalid case where
> tlb_addr points into the first "offset" bytes of the _first_
> allocated slot. But there's no way for swiotlb_bounce() to distinguish
> the first slot from the second and subsequent slots, so the WARN
> can be triggered incorrectly when #2 above is true.
>
> Related, current code calculates an adjustment to the orig_addr stored
> in the swiotlb slot. The adjustment compensates for the difference
> in the tlb_addr used for the partial sync vs. the tlb_addr for the full
> mapping. The adjustment is stored in the local variable tlb_offset.
> But when #1 and #2 above are true, it's valid for this adjustment to
> be negative. In such case the arithmetic to adjust orig_addr produces
> the wrong result due to tlb_offset being declared as unsigned.
>
> Fix these problems by removing the over-constraining validations added
> in 868c9ddc182b. Change the declaration of tlb_offset to be signed
> instead of unsigned so the adjustment arithmetic works correctly.
>
> Tested with a test-only hack to how swiotlb_tbl_map_single() calls
> swiotlb_bounce(). Instead of calling swiotlb_bounce() just once
> for the entire mapped area, do a loop with each iteration doing
> only a 128 byte partial sync until the entire mapped area is
> sync'ed. Then with swiotlb=force on the kernel boot line, run a
> variety of raw disk writes followed by read and verification of
> all bytes of the written data. The storage device has DMA
> min_align_mask set, and the writes are done with a variety of
> original buffer memory address alignments and overall buffer
> sizes. For many of the combinations, current code triggers the
> WARN statements, or the data verification fails. With the fixes,
> no WARNs occur and all verifications pass.

Thanks for the detailed analysis & test, and sorry I didn't reply to
your mail last week.
For everyone's benefit I'll reply here (question was "what were the two
bad commits for"), it's that the previous rework introduced a regression
with caamjr on unaligned mappings that I had reported here:
https://lore.kernel.org/lkml/YL7XXNOnbaDgmTB9@xxxxxxxxxxxxxxxxx/#t

(took me a while to find the threads, Link: tags have been sparse...)

Unfortunately that was ages ago so I don't really remember exactly on
which device that was reproduced.. Given the Cc I'm sure Lukas had hit
it on the MNT reform (i.MX8MQ), but I did say I tested it so I probably
could reproduce on my i.MX8MP?
I'll try to give a try at reproducing the old bug, and if I do test your
fix over next week.


>
> Fixes: 5f89468e2f06 ("swiotlb: manipulate orig_addr when tlb_addr has offset")
> Fixes: 868c9ddc182b ("swiotlb: add overflow checks to swiotlb_bounce")
> Signed-off-by: Michael Kelley <mhklinux@xxxxxxxxxxx>
> ---
> This patch is built against the 6.9-rc1 tree plus Petr Tesarik's
> "swiotlb: extend buffer pre-padding to alloc_align_mask" patch
> that's in-flight.
>
> kernel/dma/swiotlb.c | 30 +++++++++++++-----------------
> 1 file changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index d7a8cb93ef2d..d57c8837c813 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -863,27 +863,23 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
> size_t alloc_size = mem->slots[index].alloc_size;
> unsigned long pfn = PFN_DOWN(orig_addr);
> unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
> - unsigned int tlb_offset, orig_addr_offset;
> + int tlb_offset;
>
> if (orig_addr == INVALID_PHYS_ADDR)
> return;
>
> - tlb_offset = tlb_addr & (IO_TLB_SIZE - 1);
> - orig_addr_offset = swiotlb_align_offset(dev, 0, orig_addr);
> - if (tlb_offset < orig_addr_offset) {
> - dev_WARN_ONCE(dev, 1,
> - "Access before mapping start detected. orig offset %u, requested offset %u.\n",
> - orig_addr_offset, tlb_offset);
> - return;
> - }
> -
> - tlb_offset -= orig_addr_offset;
> - if (tlb_offset > alloc_size) {
> - dev_WARN_ONCE(dev, 1,
> - "Buffer overflow detected. Allocation size: %zu. Mapping size: %zu+%u.\n",
> - alloc_size, size, tlb_offset);
> - return;
> - }
> + /*
> + * It's valid for tlb_offset to be negative. This can happen when the
> + * "offset" returned by swiotlb_align_offset() is non-zero, and the
> + * tlb_addr is pointing within the first "offset" bytes of the second
> + * or subsequent slots of the allocated swiotlb area. While it's not
> + * valid for tlb_addr to be pointing within the first "offset" bytes
> + * of the first slot, there's no way to check for such an error since
> + * this function can't distinguish the first slot from the second and
> + * subsequent slots.
> + */
> + tlb_offset = (tlb_addr & (IO_TLB_SIZE - 1)) -
> + swiotlb_align_offset(dev, 0, orig_addr);
>
> orig_addr += tlb_offset;
> alloc_size -= tlb_offset;

--
Dominique Martinet | Asmadeus