Re: [RFC] Documentation: Add documentation for new performance_profile sysfs class
From: Hans de Goede
Date: Fri Oct 09 2020 - 07:33:54 EST
Hi,
On 10/5/20 2:58 PM, Limonciello, Mario wrote:
On modern systems CPU/GPU/... performance is often dynamically configurable
in the form of e.g. variable clock-speeds and TPD. The performance is often
automatically adjusted to the load by some automatic-mechanism (which may
very well live outside the kernel).
These auto performance-adjustment mechanisms often can be configured with
one of several performance-profiles, with either a bias towards low-power
consumption (and cool and quiet) or towards performance (and higher power
consumption and thermals).
Introduce a new performance_profile class/sysfs API which offers a generic
API for selecting the performance-profile of these automatic-mechanisms.
If introducing an API for this - let me ask the question, why even let each
driver offer a class interface and userspace need to change "each" driver's
performance setting?
I would think that you could just offer something kernel-wide like
/sys/power/performance-profile
Userspace can read and write to a single file. All drivers can get notified
on this sysfs file changing.
In the case of the currently intended users of this API there will be
only 1 provider (using the system type). So that pretty much does what
you suggest.
But I can see there being multiple components in a system which
each can have their own performance-profile. E.g. in some desktop
cases the CPU and GPU may be in separate compartments of the case
which each have their own independent airflow (and thus cooling budget).
Some components may even have their own air cooling with their own
external radiator.
So given the (potential) case with multiple components with each
their own thermal-profile. Then exporting only a single setting
for all components combined has 2 problems:
That would mean either some complicated policy in the kernel
for this, or a simple one where we set them all to the same level.
The simple policy has a number of issues:
1. Setting all components to the same level assume they have
identical profile options, but one component might offer low-power,
while another component does not offer that (the other component
could instead e.g. only offer quiet which is not 100% the same).
2. It is a given that some power users will be wanting to be able
to control the profiles of different components separately, so
having just a simply/naive policy in the kernel for this is not going
to work for all use-cases.
That leaves doing some complex policy mechanism, but in general
where possible the kernel tries to stay out of enforcing policies,
instead (where possible) it is greatly preferred to only offer a
mechanism to allow the functionality and then let userspace handle
any policy decisions.
So I believe that having 1 class device for each component
which offers selectable performance profiles is best.
Note that something like the performance-profile-daemon Bastien
is working on might very will choice to implement a KISS policy
where all components get configured with a similar performance-profile,
resulting in what you are suggesting. But at least this way
leaves the possibility to do things differently open.
Cc: Mark Pearson <markpearson@xxxxxxxxxx>
Cc: Elia Devito <eliadevito@xxxxxxxxx>
Cc: Bastien Nocera <hadess@xxxxxxxxxx>
Cc: Benjamin Berg <bberg@xxxxxxxxxx>
Cc: linux-pm@xxxxxxxxxxxxxxx
Cc: linux-acpi@xxxxxxxxxxxxxxx
Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>
---
.../testing/sysfs-class-performance_profile | 104 ++++++++++++++++++
1 file changed, 104 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-class-performance_profile
diff --git a/Documentation/ABI/testing/sysfs-class-performance_profile
b/Documentation/ABI/testing/sysfs-class-performance_profile
new file mode 100644
index 000000000000..9c67cae39600
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-performance_profile
@@ -0,0 +1,104 @@
+Performance-profile selection (e.g.
/sys/class/performance_profile/thinkpad_acpi/)
+
+On modern systems CPU/GPU/... performance is often dynamically configurable
+in the form of e.g. variable clock-speeds and TPD. The performance is often
+automatically adjusted to the load by some automatic-mechanism (which may
+very well live outside the kernel).
Are you intending to word this specifically to cover both firmware and userspace
implementations? Or were you really meaning firmware implementations?
What I'm trying to cover here is both firmware (run by say some microcontroller/EC)
implementations as well and in-kernel implementations like cpufreq and possible
support for controlling GPU frequencies. Note this is intended to future proof
things in case e.g. a GPU driver, where freq. control is done inside the kernel.
would like to use this API. This scenario might never materialize.
Userspace implementations (which seems to likely become a thing on at least ARM)
I would expect to have their own API (e.g. dbus / configfile) to configure their
behavior rather then making a round trip through kernel space with some virtual
device exporting this sysfs API.
+These auto performance-adjustment mechanisms often can be configured with
+one of several performance-profiles, with either a bias towards low-power
+consumption (and cool and quiet) or towards performance (and higher power
+consumption and thermals).
+
+The purpose of the performance_profile class is to offer a generic sysfs
+API for selecting the performance-profile of these automatic-mechanisms.
+
+Note that this API is only for selecting the performance-profile, it is
+NOT a goal of this API to allow monitoring the resulting performance
+characteristics. Monitoring performance is best done with device/vendor
+specific tools such as e.g. turbostat.
Another thought that comes to mind (which is completely separate from my previous
idea):
Why not make this register to firmware-attributes class as being discussed in the
new Dell driver?
It seems like it could easily be read as:
/sys/class/firmware-attributes/thinkpad-foo/attributes/PerformanceProfile/current_value
/sys/class/firmware-attributes/thinkpad-foo/attributes/PerformanceProfile/possible_values
That would be mixing boot-time settings with runtime settings under the same
API. At least some Thinkpads already have firmware attributes for selecting
the performance profile but that just sets the profile at boot.
More in general we could put pretty much anything which can be expressed as
a key=value pair inside that API, but doing so feels like abusing the API.
The firmware-attributes API really is intended only for setting values which
are stored in some nvram by the firmware, not for runtime changes.
Regards,
Hans