Re: [PATCH] iommu: Avoid softlockup and rcu stall in fq_flush_timeout().

From: Jerry Snitselaar
Date: Mon May 22 2023 - 11:19:38 EST


On Mon, May 22, 2023 at 04:58:33PM +0200, Joerg Roedel wrote:
> Hi,
>
> On Fri, Apr 28, 2023 at 11:14:54AM +0530, Vasant Hegde wrote:
> > Ping. Any suggestion on below proposal (schedule work on each CPU to free iova)?
>
> Optimizing the flush-timeout path seems to be working on the symptoms
> rather than the cause. The main question to look into first is why are
> so many CPUs competing for the IOVA allocator lock.
>
> That is a situation which the flush-queue code is there to avoid,
> obviously it does not scale to the workloads tested here. Any chance to
> check why?
>
> My guess is that the allocations are too big and not covered by the
> allocation sizes supported by the flush-queue code. But maybe this is
> something that can be fixed. Or the flush-queue code could even be
> changed to auto-adapt to allocation patterns of the device driver?
>
> Regards,
>
> Joerg

In the case I know of it involved some proprietary test suites
(Hazard I/O, and Medusa?), and the lpfc driver. I was able to force
the condition using fio with a number of jobs running. I'll play
around and see if I can figure out a point where it starts to become
an issue.

I mentioned what the nvme driver did to the Broadcom folks for the max
dma size, but I haven't had a chance to go looking at it myself yet to
see if there is somewhere in the lpfc code to fix up.

Regards,
Jerry