Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout

From: Desnes Nunes

Date: Wed Jun 10 2026 - 11:46:40 EST

Hello Michal and IOMMU maintainers,

On Wed, May 27, 2026 at 5:32 AM Michal Pecio <michal.pecio@xxxxxxxxx> wrote:
> Adding Intel IOMMU people.

And thanks a lot for all the help getting this bug at this stage Michal.

I have just found out the solution for the bug.

It has been a while, but it happens that I had to dig deep down into
the iommu root and context entries after my last message.

> Context:
>
> Desnes reported xHCI issues duing crash kernel boot after SysRq
> triggered panic. Turns out, the chip gets an IOMMU fault, some other
> devices also do. Faulting address is a successful dma_alloc_coherent()
> allocation in xhci_alloc_erst(), no evidence that it's freed before
> the fault occurs. No problems during normal boot.

Recap:
After I triggered the panic, the system collects a vmcore smootly, but
does not reboot afterwards.

The crashkernel was not completing a TRB_ENABLE_SLOT command in
xhci_alloc_dev() in xhci.c, which made xhci hold the xhci->lock and
block a systemd kworker that was also waiting for that lock on
device_shutdown().
Basing myself on past code, I created a first patch that aborted the
TRB_ENABLE_SLOT command with wait_for_completion_timeout() and killed
the HC - this released the lock and enabled the kworker to finish.
However, after a few messages and test patches with Michal, we weren't
totaly sure if this was good solution, mostly because we noticed that
HSE was already set way before ever reaching that TRB_ENABLE_SLOT
command.

Iommu:
At that moment, I noticed that dmar faults were happening to the
e1000e and xhci drivers that shared the same bus. Michal noticed that
the faulting address was a successful dma_alloc_coherent() allocation
in xhci_alloc_erst(). This stated to point to an IOMMU bug.

Afterwards, I tested booting the crashkernel with `intel_iommu=off`
and confirmed that it rebooted smootly after vmcore was captured, so
the plot tickened even more for an iommu bug.

Diging deaper, I started to suspect that the copy of bus 128's root
entry was being changed to have the Present bit cleared somehow. This
made me write a v2 patch for the bug, now in iommu, which instead of
copying the root-entry table, it disabled translation and allocated a
clean root-entry table immediately if running a kdump kernel: no DMAR
faults and smooth reboot was caried out.

Following this lead, I patched the code to dump the root entries table
using DMAR_RTADDR_REG before calling iommu_alloc_root_entry() (in
init_dmars() at drivers/iommu/intel/iommu.c), while also observing bus
0x80's root entry right at the end of copy_translation_tables():

[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 0]:
lo=0x000000018a67f001 (P=1 LCTP=0x000000018a67f000)
hi=0x00000001f855d001 (P=1 UCTP=0x00000001f855d000)
[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 1]:
lo=0x00000001f8562001 (P=1 LCTP=0x00000001f8562000)
hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000)
[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 2]:
lo=0x00000001f8564001 (P=1 LCTP=0x00000001f8564000)
hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000)
[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 3]:
lo=0x00000001f8566001 (P=1 LCTP=0x00000001f8566000)
hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000)
[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 4]:
lo=0x00000001f8569001 (P=1 LCTP=0x00000001f8569000)
hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000)
[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 5]:
lo=0x00000001f856b001 (P=1 LCTP=0x00000001f856b000)
hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000)
=> [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[128]:
lo=0x0000000000000000 (P=0 LCTP=0x0000000000000000)
hi=0x00000001f856d001 (P=1 UCTP=0x00000001f856d000)
[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[129]:
lo=0x00000001f8583001 (P=1 LCTP=0x00000001f8583000)
hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000)
[Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: 8 of 256 root entries non-zero
[Wed Jun 3 15:40:37 2026] DMAR: Translation already enabled -
trying to copy translation structures
=> [Wed Jun 3 15:40:37 2026] DMAR: dmar1: post-copy root[128]:
lo=0xe89cda001 hi=0x0
[Wed Jun 3 15:40:37 2026] DMAR: Copied translation tables from
previous kernel for dma
...
[Wed Jun 3 15:40:37 2026] DMAR: DRHD: handling fault status reg 3
=> [Wed Jun 3 15:40:37 2026] DMAR: [DMA Write NO_PASID] Request
device [80:1f.6] fault addr 0x1aba8d000 [fault reason 0x39] SM:
Present bit in Root Entry is clear
[Wed Jun 3 15:40:37 2026] DMAR: Dump dmar1 table entries for IOVA
0x1aba8d000
=> [Wed Jun 3 15:40:37 2026] DMAR: scalable mode root entry: hi
0x0000000000000000, low 0x0000000e89cda001
=> [Wed Jun 3 15:40:37 2026] DMAR: context table is not present
[Wed Jun 3 15:40:37 2026] DMAR: DRHD: handling fault status reg 3
[Wed Jun 3 15:40:37 2026] DMAR: [DMA Write NO_PASID] Request
device [80:1f.6] fault addr 0x1aba89000 [fault reason 0x39] SM:
Present bit in Root Entry is clear
[Wed Jun 3 15:40:37 2026] DMAR: Dump dmar1 table entries for IOVA
0x1aba89000
=> [Wed Jun 3 15:40:37 2026] DMAR: scalable mode root entry: hi
0x0000000000000000, low 0x0000000e89cda001
=> [Wed Jun 3 15:40:37 2026] DMAR: context table is not present
...
=> [Wed Jun 3 15:40:44 2026] xhci_hcd 0000:80:14.0: alloc ERST at
0x0000000e8a356000
...
[Wed Jun 3 15:40:44 2026] DMAR: DRHD: handling fault status reg 2
=> [Wed Jun 3 15:40:44 2026] DMAR: [DMA Read NO_PASID] Request device
[80:14.0] fault addr 0xe8a356000 [fault reason 0x39] SM: Present bit
in Root Entry is clear
[Wed Jun 3 15:40:44 2026] DMAR: Dump dmar1 table entries for IOVA
0xe8a356000
=> [Wed Jun 3 15:40:44 2026] DMAR: scalable mode root entry: hi
0x0000000000000000, low 0x0000000e89cda001
=> [Wed Jun 3 15:40:44 2026] DMAR: context table is not present

In scalable mode, a PCI bus may populate only the upper root half
(UCTP) when all devices on that bus have devfn >= 0x80. On bus 0x80, I
have e1000e at 80:1f.6 (devfn 0xfe) and xHCI at 80:14.0 (devfn 0xa0),
so the hardware root entry correctly has lo=0 and hi=UCTP present.

However, after copy_translation_tables(), I noticed that root[128].hi
was zeroed-out (Present bit cleared) and another (expected) different
value on root[128].lo.

In short, the culprit here is having a zeroed LCTP, since at
copy_context_table() the allocation of new_ce for LCTP context entries
currently governs the pos variable; which is later used to save new_ce
entries for UCTP at tbl[tbl + pos].
On the first iteration idx will be zero, old_ce_phys will be empty,
thus this moves the loop straight to devfn=0x80. At devfn 0x80, idx
wraps to 0 again ( (devfn * 2) mod 256), but since no new_ce was
previouly allocated for LCTP context entries, pos will remain zero
while copying UCTP context entries. After all upper context entries
are saved, tbl will receive new_ce from UCTP at tbl[tbl_idx + 0], and
not tbl[tbl_idx + 1]. These will be later written in
copy_translation_tables() to iommu->root_entry[bus].lo and
iommu->root_entry[bus].hi, which causes the bug.

In summary, the hardware tables were correct, but the copy path
misplaced the UCTP table for bus 0x80 when dealing with a LCTP
zeroed-out during kdump.

To fix this, I created a v3 patch that uses devfn to better track
which half we are copying, so UCTP-only buses (lo=0, hi=P) are
installed into the upper root half.

I am doing some final tests now, but since this was a lot to digest,
comments at this stage will be most appreciated.

To IOMMU maintainers: should I send this patch to the iommu mailing
list and move the discussion there?

Thanks in advance for any help on the matter,

Best Regards,

Desnes
From 4b283de2c156e270d16e292acb10884f42c0bd12 Mon Sep 17 00:00:00 2001
From: Desnes Nunes <desnesn@xxxxxxxxxx>
Date: Tue, 9 Jun 2026 23:57:05 -0300
Subject: [PATCH RFC] iommu/vt-d: Fix UCTP context table slot when copying root entries
'Content-type: text/plain'

When translation is already enabled at boot (e.g. kdump), the vt-d driver
copies context tables from the previous kernel's root table. In scalable
mode, buses that only populate the upper root half (UCTP, devfn >= 0x80)
should be written to ctxt_tbls[tbl_idx + 1] through copy_context_table().
However, the current copy path always uses tbl[tbl_idx + 0] in this situa-
tion. Since idx wraps to 0 at devfn 0x80 due to a zeroed LCTP, new_ce for
LCTP will be NULL and keep pos equals to 0. Thus, UCTP entries will be co-
pied into tbl[tbl_idx + 0] instead of tbl[tbl_idx + 1], and written after-
wards to root_entry[bus].lo instead of .hi in copy_translation_tables().

As consequence, devices on bus 0x80 with devfn >= 0x80 fail DMA with
fault 0x39, which breaks drivers running in kernels with translation
pre-enabled. This fixes NO_PASID DMAR faults for UCTP-only buses such as:

DMAR: [DMA Read NO_PASID] Request device [80:14.0] fault addr 0xe81759000 [fault reason 0x39] SM: Present bit in Root Entry is clear

Fixes: 091d42e43d21 ("iommu/vt-d: Copy translation tables from old kernel")
Signed-off-by: Desnes Nunes <desnesn@xxxxxxxxxx>
---
drivers/iommu/intel/iommu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4d0e65bc131d..737936f942a0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1443,7 +1443,7 @@ static int copy_context_table(struct intel_iommu *iommu,
struct context_entry **tbl,
int bus, bool ext)
{
- int tbl_idx, pos = 0, idx, devfn, ret = 0, did;
+ int tbl_idx, tbl_slot = 0, idx, devfn, ret = 0, did;
struct context_entry *new_ce = NULL, ce;
struct context_entry *old_ce = NULL;
struct root_entry re;
@@ -1459,10 +1459,9 @@ static int copy_context_table(struct intel_iommu *iommu,
if (idx == 0) {
/* First save what we may have and clean up */
if (new_ce) {
- tbl[tbl_idx] = new_ce;
+ tbl[tbl_idx + tbl_slot] = new_ce;
__iommu_flush_cache(iommu, new_ce,
VTD_PAGE_SIZE);
- pos = 1;
}

if (old_ce)
@@ -1484,6 +1483,9 @@ static int copy_context_table(struct intel_iommu *iommu,
}
}

+ /* Track if saving UCTP or LCTP entries in scalable mode */
+ tbl_slot = ext && devfn >= 0x80 ? 1 : 0;
+
ret = -ENOMEM;
old_ce = memremap(old_ce_phys, PAGE_SIZE,
MEMREMAP_WB);
@@ -1512,7 +1514,7 @@ static int copy_context_table(struct intel_iommu *iommu,
new_ce[idx] = ce;
}

- tbl[tbl_idx + pos] = new_ce;
+ tbl[tbl_idx + tbl_slot] = new_ce;

__iommu_flush_cache(iommu, new_ce, VTD_PAGE_SIZE);

--
2.54.0