Re: [PATCH] iommu/iova: Bettering utilizing cpu_rcaches in no-strict mode

From: zhangzekun (A)
Date: Wed Jun 26 2024 - 03:42:33 EST




在 2024/6/26 2:03, Robin Murphy 写道:
On 2024-06-25 2:29 am, zhangzekun (A) wrote:


在 2024/6/24 21:32, Robin Murphy 写道:

This patch is firstly intent to minimize the chance of softlock issue in fq_flush_timeout(), which is already dicribed erarlier in [1], which has beed applied in a commercial kernel[2] for years.

However, the later tests show that this single patch is not enough to fix the softlockup issue, since the root cause of softlockup is the underlying iova_rbtree_lock. In our softlockup scenarios, the average
time cost to get this spinlock is about 6ms.

That should already be fixed, though. The only reason for fq_flush_timeout() to interact with the rbtree at all was due to the notion of a fixed-size depot which could become full. That no longer exists since 911aa1245da8 ("iommu/iova: Make the rcache depot scale better").

Thanks,
Robin.

Hi, Robin,

The commit 911aa1245da8 ("iommu/iova: Make the rcache depot scale better") can reduce the risks of softlockup, but can not fix it entirely. We do solve a softlockup issue[1] with that patch, and that is
why it has aleady been backported in our branch. The softlockup issue which we met recently is a 5.10-based kernel with that patch already backported, which can be found in [2].

Sorry, I was implying some context that I should have made clear - yes, the softlockup can still happen in general if the flush queues are full of IOVAs which are too large for the rcache mechanism at all, so are always freed directly to the rbtree, but then there's no way *this* patch could make any difference to that case either.

Thanks,
Robin.


Yes, this patch can't fix softlockup issue in this case. In such a case, it would be better to put the free iova logic in fq_flush_timeout() to a kthread and add a cond_resched() in it.

Thanks,
Zekun