Re: [PATCH 1/2] pmdomain/rockchip: skip QoS operations for idle-only domains
From: Jonas Karlman
Date: Sun Apr 05 2026 - 19:30:06 EST
Hi,
On 4/4/2026 1:40 PM, Shawn Lin wrote:
> + Jonas
>
> 在 2026/04/04 星期六 5:27, Daniel Bozeman 写道:
>> I ran both tests you requested:
>>
>> Test 1: Added pr_err to rockchip_pd_power_on/off to identify
>> the crashing domain. With patch 2 only (skip EPROBE_DEFER),
>> the crash occurs on PD_VO:
>
> Thanks for fing the PD_VO, and I'm still requesting more docs internally
> to check what's going on. I see there are several qos nodes under PD_VO,
> but I'm not sure if they all belong to PD_VO and even not sure if their
> registers are define correctly.
>
> Perhaps, could you help dig more by removing the qos one by one from
> PD_VO to narrow down the broken qos?
>
> I also loop in Jonas who submited the code, to have a look.(I'm also
> surprised to see there aren't any Qos nodes under PD_VO in vendor
> kernel for reference, but upstream code has...)
Upstream included all QoS that seemed to be related to each power domains
based on e.g. vendor DTs, clock driver and other hints.
Vendor kernel mostly seemed to take the easy way out and flagged all
rk3528 power domains as always on or similar, if I recall correctly.
For upstream we have instead tried to describe all power domains without
any always on flag and instead ensure all devices belong to a power
domain.
I do not have access to any rk3528 TRM or similar, so I would not be
surprised if there could be some wrong details. However, runtime
testing at time of patches was sent upstream did not show any issues.
I was however able to reproduce a crash using next-20260403 + rk3528 usb
series [1][2]. Such crash was not happening at the original submission
of the pmdomain or usb series.
Looking at pmdomain core and rk pmdomain driver changes since rk3528
merge I see that there are some changes that may have changed behavior
of the driver since initial rk3528 merge. I.e. GENPD_FLAG_NO_STAY_ON.
Following quick diff seem to remove any changed behavior introduced in
commit 2bc12a8199a0 ("pmdomain: rockchip: Fix regulator dependency with
GENPD_FLAG_NO_STAY_ON"), and fixes the crash for me.
diff --git a/drivers/pmdomain/rockchip/pm-domains.c b/drivers/pmdomain/rockchip/pm-domains.c
index 490bbb1d1d8e..4d69b9f68886 100644
--- a/drivers/pmdomain/rockchip/pm-domains.c
+++ b/drivers/pmdomain/rockchip/pm-domains.c
@@ -892,7 +892,9 @@ static int rockchip_pm_add_one_domain(struct rockchip_pmu *pmu,
pd->genpd.power_on = rockchip_pd_power_on;
pd->genpd.attach_dev = rockchip_pd_attach_dev;
pd->genpd.detach_dev = rockchip_pd_detach_dev;
- pd->genpd.flags = GENPD_FLAG_PM_CLK | GENPD_FLAG_NO_STAY_ON;
+ pd->genpd.flags = GENPD_FLAG_PM_CLK;
+ if (pd->info->pwr_mask || pd->info->status_mask)
+ pd->genpd.flags |= GENPD_FLAG_NO_STAY_ON;
if (pd_info->active_wakeup)
pd->genpd.flags |= GENPD_FLAG_ACTIVE_WAKEUP;
pm_genpd_init(&pd->genpd, NULL,
Could also be that GENPD_FLAG_NO_STAY_ON only need to be applied to
need_regulator domains?
[1] https://lore.kernel.org/r/20250723122323.2344916-1-jonas@xxxxxxxxx/
[2] https://github.com/Kwiboo/linux-rockchip/commits/next-20260403-rk3528/
Regards,
Jonas
>
>>
>> rockchip_pd_power_off: vo pwr_mask=0x0
>> Internal error: synchronous external abort: 0000000096000010
>> Workqueue: pm genpd_power_off_work_fn
>> Call trace:
>> regmap_mmio_read32le+0x8/0x20
>> _regmap_bus_reg_read+0x6c/0xac
>> _regmap_read+0x60/0xd8
>> regmap_read+0x4c/0x7c
>> rockchip_pmu_set_idle_request.isra.0+0x98/0x16c
>> rockchip_pd_power+0x130/0x48c
>> rockchip_pd_power_off+0x38/0x48
>> genpd_power_off.isra.0+0x1f0/0x2f0
>> genpd_power_off_work_fn+0x34/0x54
>>
>> Test 2: Same debug build, booted with clk_ignore_unused
>> added to kernel cmdline via U-Boot. Same crash, same domain:
>>
>> rockchip_pd_power_off: vo pwr_mask=0x0
>> Internal error: synchronous external abort: 0000000096000010
>> (identical call trace)
>>
>> The crash occurs even with clk_ignore_unused. The QoS
>> registers for PD_VO are inaccessible when genpd attempts
>> to power off this idle-only domain.
>>