Re: [PATCH] iommu/arm-smmu: Don't disable next-page prefetcher on devices it works on

From: Robin Murphy
Date: Tue Oct 08 2024 - 08:34:02 EST


On 07/10/2024 11:03 am, Pankaj Patil wrote:
On 9/4/2024 1:59 PM, Pankaj Patil wrote:
On 5/17/2024 10:49 PM, Doug Anderson wrote:
Hi,

On Fri, May 17, 2024 at 9:37 AM Will Deacon <will@xxxxxxxxxx> wrote:

Hi Doug,

On Mon, May 13, 2024 at 04:13:47PM -0700, Douglas Anderson wrote:
On sc7180 trogdor devices we get a scary warning at bootup:
arm-smmu 15000000.iommu:
Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK

We spent some time trying to figure out how we were going to fix these
errata and whether we needed to do a firmware update. Upon closer
inspection, however, we realized that the errata don't apply to us.
Specifically, the errata document says that for these errata:
* Found in: r0p0
* Fixed in: r2p2

...and trogdor devices appear to be running r2p4. That means that they
are unaffected despite the scary warning.

The issue is that the kernel unconditionally tries to disable the
prefetcher even on unaffected devices and then warns when it's unable
to.

Let's change the kernel to only disable the prefetcher on affected
devices, which will get rid of the scary warning on devices that are
unaffected. As per the comment the prefetcher is
"not-particularly-beneficial" but it shouldn't hurt to leave it on for
devices where it doesn't cause problems.

Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply")
Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
---

drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)


Just curious, but did you see any performance impact (good or bad) as a
result of this patch? The next-page prefetcher has always looked a little
naive to me and, with a tendency for tiny TLBs in some implementations,
there's a possibility it could do more harm than good.

This patch actually makes no difference on trogdor today other than
getting rid of the scary warning. Specifically on trogdor the
ACR.CACHE_LOCK bit seems to be set so the kernel is unable to change
the setting anyway and has never been able to. We are working on
figuring out how to fix the firmware and then we have to get a
firmware spin before we can really see any changes. I'll keep an eye
out to see if performance numbers change when the firmware uprevs.

BTW: any idea how big of a deal these errata are? We're _just_
finishing a firmware uprev process and there is always pushback
against kicking off a new one unless the issue is important. Given
that we've been living with this issue since devices shipped I'm going
to assume we don't need to rush a firmware update, but if this is
really scary and needs to be addressed sooner we can figure that out.

-Doug

Receiving the warning on pre-silicon platforms as well, despite being unaffected. If merged, it will help in reducing log clutter.
The patch applies cleanly on the tip of linux-next, tag: next-20240904.

Following up on the patch. Please let me know if any additional
changes are required.

Surely at pre-silicon there's really very little excuse for not just fixing the firmware? Anyway, it remains the case that the real issue here is the message and comment being misleadingly over-specific, and I already sent a patch to address that[1].

Thanks,
Robin.

[1] https://lore.kernel.org/linux-iommu/7c426dc0-4fde-4d1e-bb91-538984bd8b59@xxxxxxx/