From: Catalin Marinas
Date: Mon Oct 03 2022 - 13:40:27 EST

On Sun, Oct 02, 2022 at 03:24:57PM -0700, Linus Torvalds wrote:
> On Sun, Oct 2, 2022 at 3:09 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > Non-coherent DMA for networking is going to be fun, though.
> I agree that networking is likely the main performance issue, but I
> suspect 99% of the cases would come from __alloc_skb().

The problem is not the allocation but rather having a generic enough
dma_needs_bounce() check. It won't be able to tell whether some 1500
byte range is for network or for crypto code that uses a small
ARCH_KMALLOC_MINALIGN. Getting the actual object size (e.g. with
ksize()) doesn't tell the full story on how safe the DMA is.

> Similarly, that code already has magic stuff to try to be
> cacheline-aligned for accesses, but it's not really for DMA coherency
> reasons, just purely for performance reasons (trying to make sure that
> the header accesses stay in one cacheline etc).

Yeah, __skb_alloc() ends up using SMP_CACHE_BYTES for data alignment
(via SKB_DATA_ALIGN). I have a suspicion this may break on SoCs with a
128-byte cache line but I haven't seen any report yet (there aren't many
such systems).