Re: [PATCH v5 6/6] arm64: dts: qcom: Enable cpu cooling devices for QCS9075 platforms

From: Manaf Meethalavalappu Pallikunhi
Date: Wed Jan 08 2025 - 07:11:07 EST



Hi Konrad,

On 12/31/2024 9:51 PM, Konrad Dybcio wrote:
On 31.12.2024 12:05 PM, Manaf Meethalavalappu Pallikunhi wrote:
Hi Konrad,

On 12/30/2024 9:05 PM, Konrad Dybcio wrote:
On 29.12.2024 4:23 PM, Wasim Nazir wrote:
From: Manaf Meethalavalappu Pallikunhi <quic_manafm@xxxxxxxxxxx>

In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
does corrective action for each subsystem based on sensor violation
to comply safety standards. But as QCS9075 is non-safe SoC it
requires conventional thermal mitigation to control thermal for
different subsystems.

The cpu frequency throttling for different cpu tsens is enabled in
hardware as first defense for cpu thermal control. But QCS9075 SoC
has higher ambient specification. During high ambient condition, even
lowest frequency with multi cores can slowly build heat over the time
and it can lead to thermal run-away situations. This patch restrict
cpu cores during this scenario helps further thermal control and
avoids thermal critical violation.

Add cpu idle injection cooling bindings for cpu tsens thermal zones
as a mitigation for cpu subsystem prior to thermal shutdown.

Add cpu frequency cooling devices that will be used by userspace
thermal governor to mitigate skin thermal management.

Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@xxxxxxxxxxx>
---
Does this bring measurable benefits over just making the CPU a cooling
device and pointing the thermal zones to it (and not the idle subnode)?

Konrad
As noted in the commit, CPU frequency mitigation is handled by hardware as a first level mitigation. The software/scheduler will be updated via arch_update_hw_pressure API [1] for this mitigation. Adding the same CPU mitigation in thermal zones is redundant. We are adding idle injection with a 100% duty cycle as an additional mitigation step  at higher trip to further reduce CPU power consumption. This helps device thermal stability further, especially in high ambient conditions.
I understood this much from the commit message.

What I'm asking is, whether your solution actually works better than just
letting Linux software-throttle the CPUs, preferably backed by some
numbers.
I hope by ‘your solution’ you mean HW CPU frequency throttling. Yes, we benefit from the hardware approach compared to Linux software-based CPU throttling, both in terms of tighter thermal control and improved performance.
For the Dhrystone use case from one of our boards, we observe only a 0.3°C overshoot compared to 2.5°C with software CPU throttling using the stepwise governor for same trip threshold.

I'm also unsure how this is supposed to reduce power consumption. If the
CPUs aren't busy, they should idle, and if they are not fully utilized, a
lower frequency would likely be scheduled.

By using CPU idle injection, we force the CPU to enter idle mode with the lowest LPM modes during high temperature. This approach is similar to hot-plugging a core and will further reduce static power for that CPU, helping to manage temperature further.

[1]. https://docs.kernel.org/driver-api/thermal/cpu-idle-cooling.html

Best regards,

Manaf


Konrad


[1]. https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/cpufreq/qcom-cpufreq-hw.c?h=next-20241220#n352

Best regards,

Manaf