Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework

From: Hans de Goede
Date: Mon Oct 12 2020 - 07:46:25 EST


Hi Daniel,

On 10/12/20 12:30 PM, Daniel Lezcano wrote:

Hi Hans,

On 07/10/2020 12:43, Hans de Goede wrote:
Hi,

On 10/6/20 2:20 PM, Daniel Lezcano wrote:
The density of components greatly increased the last decade bringing a
numerous number of heating sources which are monitored by more than 20
sensors on recent SoC. The skin temperature, which is the case
temperature of the device, must stay below approximately 45°C in order
to comply with the legal requirements.

The skin temperature is managed as a whole by an user space daemon,
which is catching the current application profile, to allocate a power
budget to the different components where the resulting heating effect
will comply with the skin temperature constraint.

This technique is called the Dynamic Thermal Power Management.

The Linux kernel does not provide any unified interface to act on the
power of the different devices. Currently, the thermal framework is
changed to export artificially the performance states of different
devices via the cooling device software component with opaque values.
This change is done regardless of the in-kernel logic to mitigate the
temperature. The user space daemon uses all the available knobs to act
on the power limit and those differ from one platform to another.

This series provides a Dynamic Thermal Power Management framework to
provide an unified way to act on the power of the devices.

Interesting, we have a discussion going on about a related
(while at the same time almost orthogonal) discussion for
setting policies for if the code managing the restraints
(which on x86 is often hidden in firmware or ACPI DPTF tables)
should have a bias towards trying to have as long a battery life
as possible, vs maximum performance. I know those 2 aren't
always opposite ends of a spectrum with race-to-idle, yet most
modern x86 hardware has some notion of what I call performance-profiles
where we can tell the firmware managing this to go for a bias towards
low-power / balanced / performance.

I've send a RFC / sysfs API proposal for this here:
https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@xxxxxxxxxx/

I've read the patches in this thread and as said already I think
the 2 APIs are mostly orthogonal. The API in this thread is giving
userspace direct access to detailed power-limits allowing userspace
to configure things directly (and for things to work optimal userspace
must do this). Where as in the x86 case with which I'm dealing everything
is mostly handled in a black-box and userspace can merely configure
the low-power / balanced / performance bias (*) of that black-box.

Still I think it is good if we are aware of each-others efforts here.

So Daniel, if you can take a quick look at my proposal:
https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@xxxxxxxxxx/

That would be great. I think we definitely want to avoid having 2
APIs for the same thing here. Again I don't think that is actually
the case, but maybe you see this differently ?

Thanks for pointing this out. Actually, it is a different feature as you
mentioned. The profile is the same knob we have with the BIOS where we
can choose power/ balanced power / balanced/balanced
performance / performance, AFAICT.

Right.

Here the proposed interface is already exported in userspace via the
powercap framework which supports today the backend driver for the RAPL
register.

You say that some sort of power/ balanced power / balanced /
balanced performance / performance setting in is already exported
through the powercap interface today (if I understand you correctly)?

But I'm not seeing any such setting in:
Documentation/ABI/testing/sysfs-class-powercap

Nor can I find it under /sys/class/powercap/intel-rapl* on a ThinkPad
X1 carbon 8th gen.

Note, if there indeed is an existing userspace API for this I would
greatly prefer for the thinkpad_acpi and hp-wmi (and possibly other)
drivers to use this, so if you can point me to this interface then
that would be great.

The userspace will be in charge of handling the logic to have the
correct power/performance profile tuned against the current application
running foreground. The DTPM framework gives the unified access to the
power limitation to the individual devices the userspace logic can act on.

A side note, related to your proposal, not this patch. IMO it suits
better to have /sys/power/profile.

cat /sys/power/profile

power
balanced_power *
balanced
balanced_performance
performance

The (*) being the active profile.

Interesting the same thing was brought up in the discussion surrounding
RFC which I posted.

The downside against this approach is that it assumes that there
only is a single system-wide settings. AFAIK that is not always
the case, e.g. (AFAIK):

1. The intel pstate driver has something like this
(might this be the rapl setting you mean? )

2. The X1C8 has such a setting for the embedded-controller, controlled
through the ACPI interfaces which thinkpad-acpi used

3. The hp-wmi interface allows selecting a profile which in turn
(through AML code) sets a bunch of variables which influence how
the (dynamic, through mjg59's patches) DPTF code controls various
things

At least the pstate setting and the vendor specific settings can
co-exist. Also the powercap API has a notion of zones, I can see the
same thing here, with a desktop e.g. having separate performance-profile
selection for the CPU and a discrete GPU.

So limiting the API to a single /sys/power/profile setting seems a
bit limited and I have the feeling we will regret making this
choice in the future.

With that said your proposal would work well for the current
thinkpad_acpi / hp-wmi cases, so I'm not 100% against it.

This would require adding some internal API to the code which
owns the /sys/power root-dir to allow registering a profile
provider I guess. But that would also immediately bring the
question, what if multiple drivers try to register themselves
as /sys/power/profile provider ?

Regards,

Hans