Re: [PATCH v1 3/3] swiotlb: Split up single swiotlb lock

From: Chao Gao
Date: Wed Jun 29 2022 - 22:43:08 EST


On Tue, Jun 28, 2022 at 03:01:34PM +0800, Chao Gao wrote:
>From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>
>Traditionally swiotlb was not performance critical because it was only
>used for slow devices. But in some setups, like TDX confidential
>guests, all IO has to go through swiotlb. Currently swiotlb only has a
>single lock. Under high IO load with multiple CPUs this can lead to
>signifiant lock contention on the swiotlb lock. We've seen 20+% CPU
>time in locks in some extreme cases.
>
>This patch splits the swiotlb into individual areas which have their
>own lock. Each CPU tries to allocate in its own area first. Only if
>that fails does it search other areas. On freeing the allocation is
>freed into the area where the memory was originally allocated from.
>
>To avoid doing a full modulo in the main path the number of swiotlb
>areas is always rounded to the next power of two. I believe that's
>not really needed anymore on modern CPUs (which have fast enough
>dividers), but still a good idea on older parts.
>
>The number of areas can be set using the swiotlb option. But to avoid
>every user having to set this option set the default to the number of
>available CPUs. Unfortunately on x86 swiotlb is initialized before
>num_possible_cpus() is available, that is why it uses a custom hook
>called from the early ACPI code.
>
>Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>[ rebase and fix warnings of checkpatch.pl ]
>Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>

Just noticed that Tianyu already posted a variant of this patch.
Will drop this one from my series.