Re: [PATCH] usb: xhci-pci: Disable 64-bit DMA for VIA VL805

From: Xincheng Zhang

Date: Wed Jun 24 2026 - 22:16:57 EST


On 2026-06-25 02:04 +0200, Michal Pecio wrote:
> By the way, are you sure that 64GB is the magic number and not 1TB?
> I booted a common AMD64 box with iommu.forcedac=1 and instantly got
> IOMMU faults, but the addresses were truncated to 40 bits, not 36.
>
> I applied 40 bit DMA mask and my VL805 seems to work. I looked into
> debugfs and many things are mapped close to 1TB, so I wonder if some
> chips are better than others or maybe there are particular workloads
> where VL805 truncates something to 36 bits? I tried a few, including
> bulk, interrupt, isochronous and USB3 bulk streams.
>
> How was this problem found? Do you have >64GB RAM and no IOMMU?
> Or with IOMMU whose driver allocates mappings above 4GB?

Hi Michal,

Thanks for testing this.

I re-tested this on our side and collected more details. The machine is an
UltraRISC DP1000 riscv64 system with a VIA VL805/806 controller:

0002:01:00.0 USB controller [0c03]: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller [1106:3483] (rev 01)

The booted kernel was built with xHCI debug enabled and with the VL805
XHCI_NO_64BIT_SUPPORT quirk removed, so the controller was allowed to use
the normal AC64/64-bit DMA path.

This is a no-IOMMU setup, and the command line does not include
iommu.forcedac.

The system has about 64 GB of RAM:

MemTotal: 65837388 kB

With the 64-bit DMA path enabled, xHCI setup logged DMA addresses above
0x1000000000:

xhci_hcd 0002:01:00.0: Enabling 64-bit DMA addresses.
xhci_hcd 0002:01:00.0: // Device context base array address = 0x0x0000001075cef000 (DMA), (____ptrval____) (virt)
xhci_hcd 0002:01:00.0: First segment DMA is 0x0x0000000108961000
xhci_hcd 0002:01:00.0: ERST deq = 64'h1075ca7000
xhci_hcd 0002:01:00.0: Slot 1 output ctx = 0x0x0000001075387000 (dma)
xhci_hcd 0002:01:00.0: Output Context DMA address = 0x1075387000

Shortly afterwards the controller stopped responding:

xhci_hcd 0002:01:00.0: Command timeout, USBSTS: 0x00000000
xhci_hcd 0002:01:00.0: Command timeout
xhci_hcd 0002:01:00.0: Abort command ring
xhci_hcd 0002:01:00.0: Abort failed to stop command ring: -110
xhci_hcd 0002:01:00.0: Host halt failed, -110
xhci_hcd 0002:01:00.0: xHCI host controller not responding, assume dead
xhci_hcd 0002:01:00.0: HC died; cleaning up
xhci_hcd 0002:01:00.0: Timeout while waiting for setup device command

With the 36-bit DMA mask applied earlier, the same controller initialized
and enumerated normally on this machine.

> By the way, are you sure that 64GB is the magic number and not 1TB?
> [...] the addresses were truncated to 40 bits, not 36.

I'm not claiming 64 GB is a hard architectural limit for all VL805/806
parts. On this particular controller (rev 01), the failing addresses
(e.g. 0x1075cef000) are only just above the 64 GiB / 36-bit boundary but
well within 40 bits, and the controller still died. So the effective limit
on this part appears to be lower than the 40 bits you observed.

> I applied 40 bit DMA mask and my VL805 seems to work. [...] so I wonder if
> some chips are better than others [...]

That matches my suspicion - this looks like it may be silicon/revision
dependent. Your part tolerating ~1 TB with a 40-bit mask while mine fails
just above 36 bits is a strong hint that the two chips behave differently.

> How was this problem found? Do you have >64GB RAM and no IOMMU?
> Or with IOMMU whose driver allocates mappings above 4GB?

The former: this box has ~64 GB of RAM and no IOMMU, so the buffers are
simply placed at their real physical addresses, some of which fall above
0x1000000000 (64 GiB). There is no IOMMU and no forcedac involved here.

One practical limitation on my side: this DP1000 only has 64 GB of RAM, so I
cannot generate physical addresses above the 40-bit range at all, and
without an IOMMU I have no way to force higher addresses either. That means
I can't bisect the real upper boundary (37/38/39/40 bits) on this hardware -
the most I can observe is that the failure already happens just above 36
bits.

Given that 36 bits is the only value I can reliably verify fixes the hang on
real hardware here, I'd lean towards keeping it as a conservative default
unless we can establish that a higher mask is safe across revisions.

Regards,
Xincheng