Re: [PATCH 1/2 V2] iommu/amd: Add workaround for ERBT1312

From: Don Dutile
Date: Tue Apr 23 2013 - 09:22:56 EST


On 04/18/2013 12:28 PM, Joerg Roedel wrote:
On Thu, Apr 18, 2013 at 11:13:19AM -0500, Suravee Suthikulanit wrote:
This workaround is required for both event log and ppr log. Your
patch is only taking care of the event log.

Right, thanks for the notice. Here is the updated patch.

From cebe04596989c4b9001e2c1571c4fb219ea37b99 Mon Sep 17 00:00:00 2001
From: Joerg Roedel<joro@xxxxxxxxxx>
Date: Thu, 18 Apr 2013 17:55:04 +0200
Subject: [PATCH] iommu/amd: Workaround for ERBT1312

Work around an IOMMU hardware bug where clearing the
EVT_INT or PPR_INT bit in the status register may race with
the hardware trying to set it again. When not handled the
bit might not be cleared and we lose all future event or ppr
interrupts.

Reported-by: Suravee Suthikulpanit<suravee.suthikulpanit@xxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Joerg Roedel<joro@xxxxxxxxxx>
---
drivers/iommu/amd_iommu.c | 34 ++++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f42793d..27792f8 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -700,14 +700,23 @@ retry:

static void iommu_poll_events(struct amd_iommu *iommu)
{
- u32 head, tail;
+ u32 head, tail, status;
unsigned long flags;

- /* enable event interrupts again */
- writel(MMIO_STATUS_EVT_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET);
-
spin_lock_irqsave(&iommu->lock, flags);

+ /* enable event interrupts again */
+ do {
+ /*
+ * Workaround for Erratum ERBT1312
+ * Clearing the EVT_INT bit may race in the hardware, so read
+ * it again and make sure it was really cleared
+ */
+ status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
+ writel(MMIO_STATUS_EVT_INT_MASK,
+ iommu->mmio_base + MMIO_STATUS_OFFSET);
+ } while (status& MMIO_STATUS_EVT_INT_MASK);
+
head = readl(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
tail = readl(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET);

@@ -744,16 +753,25 @@ static void iommu_handle_ppr_entry(struct amd_iommu *iommu, u64 *raw)
static void iommu_poll_ppr_log(struct amd_iommu *iommu)
{
unsigned long flags;
- u32 head, tail;
+ u32 head, tail, status;

if (iommu->ppr_log == NULL)
return;

- /* enable ppr interrupts again */
- writel(MMIO_STATUS_PPR_INT_MASK, iommu->mmio_base + MMIO_STATUS_OFFSET);
-
spin_lock_irqsave(&iommu->lock, flags);

+ /* enable ppr interrupts again */
+ do {
+ /*
+ * Workaround for Erratum ERBT1312
+ * Clearing the PPR_INT bit may race in the hardware, so read
+ * it again and make sure it was really cleared
+ */
+ status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
+ writel(MMIO_STATUS_PPR_INT_MASK,
+ iommu->mmio_base + MMIO_STATUS_OFFSET);
+ } while (status& MMIO_STATUS_PPR_INT_MASK);
+
head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET);
tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET);

Given other threads on this mail list (and I've seen crashes with same problem)
where this type of logging during a flood of IOMMU errors will lock up the machine,
is there something that can be done to break the do-while loop after n iterations
have been exec'd, so the kernel can progress during a crash ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/