Re: [PATCH 1/2] drm/msm/dpu: fix mismatch between power and frequency

From: Konrad Dybcio

Date: Thu Feb 26 2026 - 08:41:21 EST

On 1/12/26 9:25 AM, yuanjiey wrote:
> On Mon, Jan 12, 2026 at 09:38:41AM +0200, Dmitry Baryshkov wrote:
>> On Mon, 12 Jan 2026 at 08:23, yuanjiey <yuanjie.yang@xxxxxxxxxxxxxxxx> wrote:
>>>
>>> On Fri, Jan 09, 2026 at 05:22:37PM +0200, Dmitry Baryshkov wrote:
>>>> On Fri, Jan 09, 2026 at 04:38:07PM +0800, yuanjie yang wrote:
>>>>> From: Yuanjie Yang <yuanjie.yang@xxxxxxxxxxxxxxxx>
>>>>>
>>>>> During DPU runtime suspend, calling dev_pm_opp_set_rate(dev, 0) drops
>>>>> the MMCX rail to MIN_SVS while the core clock frequency remains at its
>>>>> original (highest) rate. When runtime resume re-enables the clock, this
>>>>> may result in a mismatch between the rail voltage and the clock rate.
>>>>>
>>>>> For example, in the DPU bind path, the sequence could be:
>>>>> cpu0: dev_sync_state -> rpmhpd_sync_state
>>>>> cpu1: dpu_kms_hw_init
>>>>> timeline 0 ------------------------------------------------> t
>>>>>
>>>>> After rpmhpd_sync_state, the voltage performance is no longer guaranteed
>>>>> to stay at the highest level. During dpu_kms_hw_init, calling
>>>>> dev_pm_opp_set_rate(dev, 0) drops the voltage, causing the MMCX rail to
>>>>> fall to MIN_SVS while the core clock is still at its maximum frequency.
>>>>
>>>> Ah, I see. dev_pm_set_rate(0) transforms to _disable_opp_table(), which
>>>> doesn't do anything with clocks. I think we should have a call to
>>>> clk_disable_unprepare() before that and clk_prepare_enable() in the
>>>> resume path.
>>>
>>> I do check the backtrace on kaanapali:
>>>
>>> active_corner = 3 (MIN_SVS)
>>> rpmhpd_aggregate_corner+0x168/0x1d0
>>> rpmhpd_set_performance_state+0x84/0xd0
>>> _genpd_set_performance_state+0x50/0x198
>>> genpd_set_performance_state.isra.0+0xbc/0xdc
>>> genpd_dev_pm_set_performance_state+0x60/0xc4
>>> dev_pm_domain_set_performance_state+0x24/0x3c
>>> _set_opp+0x370/0x584
>>> dev_pm_opp_set_rate+0x258/0x2b4
>>> dpu_runtime_suspend+0x50/0x11c [msm]
>>> pm_generic_runtime_suspend+0x2c/0x44
>>> genpd_runtime_suspend+0xac/0x2d0
>>> __rpm_callback+0x48/0x19c
>>> rpm_callback+0x74/0x80
>>> rpm_suspend+0x108/0x588
>>> rpm_idle+0xe8/0x190
>>> __pm_runtime_idle+0x50/0x94
>>> dpu_kms_hw_init+0x3a0/0x4a8
>>>
>>> dev_pm_opp_set_rate(dev, 0); just low power to MIN_SVS.
>>> And freq MDP: 650MHz
>>>
>>>
>>> And I try change the order:
>>> from:
>>> dev_pm_opp_set_rate(dev, 0);
>>> clk_bulk_disable_unprepare(dpu_kms->num_clocks, dpu_kms->clocks);
>>> to:
>>> clk_bulk_disable_unprepare(dpu_kms->num_clocks, dpu_kms->clocks);
>>> dev_pm_opp_set_rate(dev, 0);
>>>
>>> But the issue is still here.
>>
>> But which clock is still demanding higher MMCX voltage? All DPU clocks
>> should be turned off at this point.
> Yes, no DPU clock demand higher MMCX voltage here. But next time pm_runtime_get_sync
> need higher MMCX voltagei due to high freq.
>
>>>
>>>
>>>>> When the power is re-enabled, only the clock is enabled, leading to a
>>>>> situation where the MMCX rail is at MIN_SVS but the core clock is at its
>>>>> highest rate. In this state, the rail cannot sustain the clock rate,
>>>>> which may cause instability or system crash.
>>>>>
>>>>> Fix this by setting the corresponding OPP corner during both power-on
>>>>> and power-off sequences to ensure proper alignment of rail voltage and
>>>>> clock frequency.
>>>>>
>>>>> Fixes: b0530eb11913 ("drm/msm/dpu: Use OPP API to set clk/perf state")
>>>>>
>>>>> Signed-off-by: Yuanjie Yang <yuanjie.yang@xxxxxxxxxxxxxxxx>
>>>>
>>>> No empty lines between the tags. Also please cc stable.
>>>
>>> Sure, will fix.
>>>
>>>>> ---
>>>>> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 16 ++++++++++++----
>>>>> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 3 +++
>>>>> 2 files changed, 15 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>>>>> index 0623f1dbed97..c31488335f2b 100644
>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
>>>>> @@ -1306,9 +1306,14 @@ static int dpu_kms_init(struct drm_device *ddev)
>>>>> struct dpu_kms *dpu_kms = to_dpu_kms(priv->kms);
>>>>> struct dev_pm_opp *opp;
>>>>> int ret = 0;
>>>>> - unsigned long max_freq = ULONG_MAX;
>>>>> + dpu_kms->max_freq = ULONG_MAX;
>>>>> + dpu_kms->min_freq = 0;
>>>>>
>>>>> - opp = dev_pm_opp_find_freq_floor(dev, &max_freq);
>>>>> + opp = dev_pm_opp_find_freq_floor(dev, &dpu_kms->max_freq);
>>>>> + if (!IS_ERR(opp))
>>>>> + dev_pm_opp_put(opp);
>>>>> +
>>>>> + opp = dev_pm_opp_find_freq_ceil(dev, &dpu_kms->min_freq);
>>>>> if (!IS_ERR(opp))
>>>>> dev_pm_opp_put(opp);
>>>>>
>>>>> @@ -1461,8 +1466,8 @@ static int __maybe_unused dpu_runtime_suspend(struct device *dev)
>>>>> struct msm_drm_private *priv = platform_get_drvdata(pdev);
>>>>> struct dpu_kms *dpu_kms = to_dpu_kms(priv->kms);
>>>>>
>>>>> - /* Drop the performance state vote */
>>>>> - dev_pm_opp_set_rate(dev, 0);
>>>>> + /* adjust the performance state vote to low performance state */
>>>>> + dev_pm_opp_set_rate(dev, dpu_kms->min_freq);
>>>>
>>>> Here min_freq is the minumum working frequency, which will keep it
>>>> ticking at a high frequency. I think we are supposed to turn it off
>>>> (well, switch to XO). Would it be enough to swap these two lines
>>>> instead?
>>>
>>> Yes, just clk_bulk_disable_unprepare(dpu_kms->num_clocks, dpu_kms->clocks) to
>>> disable clk is OK.
>>> we can drop change here.
>>>
>>>>> clk_bulk_disable_unprepare(dpu_kms->num_clocks, dpu_kms->clocks);
>>>>>
>>>>> for (i = 0; i < dpu_kms->num_paths; i++)
>>>>> @@ -1481,6 +1486,9 @@ static int __maybe_unused dpu_runtime_resume(struct device *dev)
>>>>> struct drm_device *ddev;
>>>>>
>>>>> ddev = dpu_kms->dev;
>>>>> + /* adjust the performance state vote to high performance state */
>>>>> + if (dpu_kms->max_freq != ULONG_MAX)
>>>>> + dev_pm_opp_set_rate(dev, dpu_kms->max_freq);
>>>>
>>>> This one should not be necessary, we should be setting the performance
>>>> point while comitting the DRM state.
>>>
>>> No, issue is during dpu binding path. And in msm_drm_kms_init driver just
>>> pm_runtime_resume_and_get enable power and pm_runtime_put_sync disable power.
>>> But We do not set the appropriate OPP each time the power is enabled.
>>> As a result, a situation may occur where the rail stays at MIN_SVS while the
>>> MDP is running at a high frequency.
>>
>> Please move dev_pm_opp_set_rate() from dpu_kms_init() to dpu_kms_hw_init().
>
> During dpu_kms_hw_init and msm_drm_kms_init and msm_drm_kms_post_init.
>
> There are multiple places where pm_runtime_get_sync(pm_runtime_resume_and_get)and pm_runtime_put_sync are called.
> And each time after pm_runtime_get_sync(pm_runtime_resume_and_get) will access register depond on MDP clk.
>
> Do I need to add dev_pm_opp_set_rate after every pm_runtime_get_sync and pm_runtime_resume_and_get?

So I took another look at this thread and I think the correct resolution
here would be to stop calling dev_pm_opp_set_rate(dev, 0) altogether if
the clock is getting disabled, since the PM APIs seem to take care of
removing the vote during runtime suspend:

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index 61d7e65469b3..ddc6aeae8f55 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -1462,7 +1462,7 @@ static int __maybe_unused dpu_runtime_suspend(struct device *dev)
struct dpu_kms *dpu_kms = to_dpu_kms(priv->kms);

/* Drop the performance state vote */
- dev_pm_opp_set_rate(dev, 0);
+ pr_err("!!!! SUSPENDING DPU\n");
clk_bulk_disable_unprepare(dpu_kms->num_clocks, dpu_kms->clocks);

for (i = 0; i < dpu_kms->num_paths; i++)

[root@sc8280xp-crd ~]# XDG_RUNTIME_DIR=/run/user/1000 DISPLAY=:0 xset dpms force off
[ 163.099585] [drm:dpu_runtime_suspend:1465] !!!! SUSPENDING DPU
[root@sc8280xp-crd ~]# grep ^ /sys/kernel/debug/pm_genpd/mmcx/*
/sys/kernel/debug/pm_genpd/mmcx/active_time:159146 ms
/sys/kernel/debug/pm_genpd/mmcx/current_state:off-0
/sys/kernel/debug/pm_genpd/mmcx/devices:ad00000.clock-controller
/sys/kernel/debug/pm_genpd/mmcx/devices:af00000.clock-controller
/sys/kernel/debug/pm_genpd/mmcx/devices:ae01000.display-controller
/sys/kernel/debug/pm_genpd/mmcx/devices:aea0000.displayport-controller
/sys/kernel/debug/pm_genpd/mmcx/devices:ae90000.displayport-controller
/sys/kernel/debug/pm_genpd/mmcx/devices:ae98000.displayport-controller
/sys/kernel/debug/pm_genpd/mmcx/idle_states:State Time(ms) Usage Rejected Above Below S2idle
/sys/kernel/debug/pm_genpd/mmcx/idle_states:S0 67845 3 0 0 0 0
/sys/kernel/debug/pm_genpd/mmcx/idle_states_desc:State Latency(us) Residency(us) Name
/sys/kernel/debug/pm_genpd/mmcx/idle_states_desc:S0 0 0 N/A
/sys/kernel/debug/pm_genpd/mmcx/perf_state:0
/sys/kernel/debug/pm_genpd/mmcx/sub_domains:titan_top_gdsc
/sys/kernel/debug/pm_genpd/mmcx/sub_domains:disp0_mdss_gdsc
/sys/kernel/debug/pm_genpd/mmcx/sub_domains:disp0_mdss_int2_gdsc
/sys/kernel/debug/pm_genpd/mmcx/total_idle_time:67846 ms

(and the correct vote comes back up as the DPU resumes)

At the same time, we *do* need to ensure the correct level is set *before*
clk_prepare_enable and any set_rate operations (the latter is already done
via dev_pm_opp_set_rate())

clk_prepare_enable() happens in:
msm_mdss.c : msm_mdss_enable()
dpu_kms.c : dpu_runtime_resume()

(they refer to two disjoint sets of clocks but both cases need the fix)

I think the easiest solution in the MDP case would be to set the clocks to
the highest available OPP (lowest or *any* would work too, but going turbo
seems like it's going to give us a stronger foundation for adopting
cont_splash one day, avoiding potentially momentarily undervolting running
hw) during probe and count on these votes being adjusted either through
suspend (if unused) or to the actually needed frequency (via dpu_perf)

For MDSS, we're currently generally describing the MDSS_AHB clock, the
GCC_AHB clock and the MDP clock (sounds wrong?) - there's not even an OPP
table.. The GCC clock is sourced from (and scaled by) the NoC, but the
MDSS_AHB one seems to have 3 actually configurable performance points
that neither we nor seemingly the downstream driver seem to really care
about (i.e. both just treat it as on/off). If we need to scale it, we
should add an OPP table, if we don't, we should at least add required-opps.

The MDP4/MDP5 drivers are probably wrong too in this regard, but many
targets supported by these may not even have OPP tables and are generally
not super high priority for spending time on.. that can, I'd kick down the
road unless someone really wants to step up

Konrad