RE: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver

From: John Madieu
Date: Tue Mar 11 2025 - 07:33:44 EST


Hi Biju,

Thanks for your review.

> -----Original Message-----
> From: Biju Das <biju.das.jz@xxxxxxxxxxxxxx>
> Sent: Monday, March 10, 2025 11:18 AM
> To: John Madieu <john.madieu.xa@xxxxxxxxxxxxxx>; geert+renesas@xxxxxxxxx;
> niklas.soderlund+renesas@xxxxxxxxxxxx; conor+dt@xxxxxxxxxx;
> krzk+dt@xxxxxxxxxx; robh@xxxxxxxxxx; rafael@xxxxxxxxxx;
> daniel.lezcano@xxxxxxxxxx
> Subject: RE: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
>
> Hi John,
>
> Thanks for the patch.
>
> > -----Original Message-----
> > From: John Madieu <john.madieu.xa@xxxxxxxxxxxxxx>
> > Sent: 09 March 2025 12:13
> > Subject: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
> >
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > This patch series introduces a new thermal cooling driver that
> > implements CPU hotplug-based thermal management. The driver
> > dynamically takes CPUs offline during thermal excursions to reduce
> > power consumption and prevent overheating, while maintaining system
> stability by keeping at least one CPU online.
> >
> > 1- Problem Statement
> >
> > Modern SoCs require robust thermal management to prevent overheating
> > under heavy workloads. Existing cooling mechanisms like frequency
> > scaling may not always provide sufficient thermal relief, especially in
> multi-core systems where per-core thermal contributions can be
> significant.
> >
> > 2- Solution Overview
> >
> > The driver:
> >
> > - Integrates with the Linux thermal framework as a cooling device
> > - Registers per-CPU cooling devices that respond to thermal trip
> > points
> > - Uses CPU hotplug operations to reduce thermal load
> > - Maintains system stability by preserving the boot CPU from being
> > put offline, regardless the CPUs that are specified in cooling device
> list.
> > - Implements proper state tracking and cleanup
> >
> > Key Features:
> >
> > - Dynamic CPU online/offline management based on thermal thresholds
> > - Device tree-based configuration via thermal zones and trip points
> > - Hysteresis support through thermal governor interactions
> > - Safe handling of CPU state transitions during module load/unload
> > - Compatibility with existing thermal management frameworks
> >
> > Testing
> >
> > - Verified on Renesas RZ/G3E platforms with multi-core CPU
> > configurations
> > - Validated thermal response using artificial load generation
> > (emul_temp)
> > - Confirmed proper interaction with other cooling devices
> > - Verified support for 'plug' type trace events
> > - Tested with step_wise governor
> >
> > As the 'hot' type is already used for user space notification, I've
> choosen 'plug' for this new type.
> > suggestions on this are welcome. Here is an example of 'thermal-zone'
> that integrate 'plug' type:
> >
> > ```
> > thermal-zones {
> > cpu-thermal {
> > polling-delay = <1000>;
> > polling-delay-passive = <250>;
> > thermal-sensors = <&tsu>;
> >
> > cooling-maps {
> > map0 {
> > trip = <&target>;
> > cooling-device = <&cpu0 0 3>, <&cpu3 0 3>;
> > contribution = <1024>;
> > };
>
> Is it not possible here to make cpu1 and cpu2 as well for DVFS passive
> cooling?

>From my tests, adding same CPUs as cooling devices in both maps
generated some warnings saying that the trip could not be bound
to my ("plug") cooling device.

This is a point I still must investigate, and comments from maintainers
would be welcome. However, despite these warnings, I had no unexpected
behavior, and even thermal trace events were Ok.

>
> >
> > map1 {
> > trip = <&trip_emergency>;
> > cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
> > contribution = <1024>;
> > };
> >
> > };
>
> Is it not possible here to make cpu3 as well as hot pluggable device for
> cooling?
>
> Cheers,
> Biju

Regards,
John