Re: USB-audio isochronous Missed Service Errors on AMD Zen5 client (Fire Range) -- Data Fabric idle C-state? No OS-level knob found

From: Mathias Nyman

Date: Mon Jun 01 2026 - 08:59:13 EST

On 6/1/26 10:38, Gordon Chen wrote:

Hi Mario, Shyam, Mathias,

I'm reporting what looks like an interaction between AMD client SoC idle
power management (Data Fabric / SOCCLK) and low-latency isochronous USB
DMA, on a Ryzen 9 9955HX3D (Fire Range, Zen5) laptop. It reproduces on any
AMD Zen4/5 client I or others have tried, with any USB audio interface;
there is an open community bug with several different-vendor devices [1].

I have a strong differential pointing below the OS, but no way to read DF
C-state residency on this platform to confirm -- hence this mail. Questions
are at the end.

Symptom
-------
USB audio playback has frequent audible clicks, and after ~10-30 min of
continuous playback an occasional full stall. Each click maps 1:1 (time-
aligned) to a short isochronous OUT URB; at the packet level these are
iso_frame_desc[i].status = -EXDEV (COMP_MISSED_SERVICE_ERROR) -- the xHCI
controller missed servicing a 125 us microframe. urb->error_count stays 0
and nothing reaches dmesg; you have to look at per-packet status or count
short OUT URBs. The stall is the same fault at its extreme: on an implicit-
feedback device, a whole errored capture URB starves the OUT ring until
re-plug.

What I tried (all ineffective except the last)
----------------------------------------------
cpu_dma_latency = 0 (PM QoS, verified locked) no change
cpuidle: all C-states disabled no change
cpufreq governor = performance no change
amdgpu power_dpm_force_performance_level = high no change
stress-ng --cpu 16 (pure compute, little mem traffic) no change
stress-ng --stream (sustained memory bandwidth) misses -> 0

So CPU residency/frequency is not the lever; sustained memory-controller
traffic is. Reading SoC clock DPM (amdgpu pp_dpm_socclk / pp_dpm_mclk):
idle and under --cpu it sits at socclk 400 MHz / mclk 1600 MHz; under
--stream it jumps to socclk 1200 / mclk 2800.

This points away from clock frequency as the lever: a single --stream
worker already pins socclk=1200 / mclk=2800, yet the misses persist; it
takes several workers of continuous traffic before they drop to zero. So
what tracks the fix is not the clock the SoC reaches, but the amount of
sustained memory-controller traffic. My working hypothesis is that the
Data Fabric idles into a low-power state between the sparse isochronous
transactions, and that the wake/traversal latency on the next microframe
can then exceed the 125 us deadline -- with only traffic dense enough to
keep the fabric out of that state avoiding it. I cannot confirm that from
the OS (I have no way to read DF C-state residency here), so I would
welcome a sanity check on the mechanism; the measurements above are what
I am confident of.

I could not reach this from the OS: profile_peak / force=high do not raise
socclk (ignored for an otherwise-idle iGPU), and a manual pp_dpm_socclk /
pp_dpm_mclk write is rejected with -EINVAL. The BIOS on this laptop exposes
no "DF C-States" / "Power Supply Idle Control" option.

Reproducer
----------
Any AMD Zen4/5 client + any USB audio interface, playing continuously:

# count short OUT URBs (= missed-service events); nominal is the device's
# full OUT packet size (768 bytes for the Flow 8 at 48 kHz / 4 ch)
bpftrace -e 'kprobe:snd_complete_urb {
$u = (struct urb *)arg0;
if ((($u->pipe >> 7) & 1) == 0 && $u->actual_length < 768) { @miss++; }
}'

# while watching: cat /sys/class/drm/cardN/device/pp_dpm_socclk
# baseline: misses > 0, socclk pinned at 400 MHz
# with stress-ng --stream 4 running: misses 0, socclk 1200 MHz

Questions
---------
1. Is this a known interaction between DF idle power state (DF C-states /
SOCCLK gating) and latency-sensitive isochronous DMA on client SoCs --
an erratum or documented behaviour?

2. Is there a supported way to hold the Data Fabric out of its deep idle
state (or set a fabric-active floor) while a latency-sensitive iso USB
endpoint is streaming -- a kernel interface (amd_pmc / amd_node?), or a
firmware/PMFW setting -- short of burning memory bandwidth?

If xHCI supports Latency Tolerance Messaging Capability (LTC) bit in the
HCCPARAMS1 register then it should be capable of accepting latency tolerance
messages from USB3 devices, keep track of the shortest tolerated value, and
send it forward to the fabric (if PCIe then it should use PCIe LTR messages)
xHCI driver does not take part in this.
see xHCI specs 4.13.6

LTC capability in xHCI also supports driver adding custom Latency values that
the xHC hardware should take into account when calculating shortest tolerated
latency.
xhci driver does not (yet) support this, but could be worth hacking some PoC
code together to see if this works.
HCCPARAMS register can be read via debugs from:
debug/usb/xhci/<address>/reg-cap

Thanks
Mathias