Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
From: Shawn Lin
Date: Wed Apr 01 2026 - 03:11:51 EST
+ Finley
在 2026/04/01 星期三 14:13, Daniel Bozeman 写道:
<fc6f00fa-10b0-44c8-d8b4-694f8ff3b9ea@xxxxxxxxxxxxxx>
I ran additional tests to gather evidence:
Test 1: Patch 2 only (skip EPROBE_DEFER), no patch 1.
Result: kernel panic. The idle-only domains register
successfully, but genpd_power_off_work_fn attempts to power
them off and crashes in rockchip_pmu_set_idle_request:
Internal error: synchronous external abort: 0000000096000010
CPU: 0 PID: 59 Comm: kworker/0:3
Workqueue: pm genpd_power_off_work_fn
pc : regmap_mmio_read32le+0x8/0x20
Call trace:
regmap_mmio_read32le+0x8/0x20
_regmap_bus_reg_read+0x6c/0xac
_regmap_read+0x60/0xd8
regmap_read+0x4c/0x7c
rockchip_pmu_set_idle_request.isra.0+0x94/0x1b4
rockchip_pd_power+0x378/0x604
rockchip_pd_power_off+0x14/0x34
genpd_power_off.isra.0+0x1f0/0x2f0
genpd_power_off_work_fn+0x34/0x54
Test 2: No kernel patches, PD_GPU disabled via
status = "disabled" in DTS to avoid EPROBE_DEFER entirely.
Result: same kernel panic. The idle-only domains register
but crash identically when genpd tries to power them off.
Same call trace as above.
So the crash is not caused by probe ordering or
EPROBE_DEFER -- it happens whenever idle-only domains
(pwr_mask == 0) are registered and genpd attempts to
power them off. The QoS register access in
rockchip_pmu_set_idle_request faults on these domains.
Regarding S2R: you raise a valid concern about skipping
QoS save/restore. However, these idle-only domains cannot
actually be powered off (pwr_mask == 0), so
rockchip_do_pmu_set_power_domain already returns early for
them. The QoS save/idle/restore cycle in rockchip_pd_power
has no effect on these domains since the power state never
changes -- the save and restore are paired around a no-op.
Skipping the entire sequence for pwr_mask == 0 should be
safe for S2R as well.
Thanks for these details but I think it explains the phenomenon
and work around it but didn't explain the root cause.
RK3528 SoC can't power down these PDs but only support to idle them.
Right, idle these PDs could still make QoS registers inaccessable.
But from the code, rockchip_pmu_save_qos() and
rockchip_pmu_restore_qos() both are called under idle-free state.
One possible guess is it's clk related. Could you please help
test your environment with "clk_ignore_unused" set in cmdline?
Another test is to print out genpd->name in the entry of
rockchip_pd_power_on() and rockchip_pd_power_off() to see
which one is inaccessable.
Both patches are needed:
- Patch 1: prevents the QoS crash on idle-only domains
- Patch 2: prevents probe teardown from making things worse
Tested on NanoPi Zero2 (RK3528) with all four scenarios:
both patches (boots), patch 2 only (panic), DTS workaround
(panic), both patches + E20C regression test (no issues).
The board DTS that triggers this (GPIO-controlled USB VBUS
regulator on GPIO4/PD_RKVENC) can be seen at:
https://github.com/dboze/openwrt/blob/add-nanopi-zero2-clean/target/linux/rockchip/patches-6.12/102-arm64-dts-rockchip-Add-FriendlyElec-NanoPi-Zero2.patch