Re: PowerOP 0/3: System power operating point management API

From: Todd Poynor
Date: Tue Aug 16 2005 - 20:40:34 EST


Dominik Brodowski wrote:

First, the table interface you suggest is ugly. If there's indeed the need for
such an abstraction, I'd favour something like

I'm planning to adopt the previous suggestions of an opaque data structure and stop trying to have any generic structure to it. I'll try to leave dependency checking etc. to the upper layers as much as possible, since platforms vary greatly in this and so do the needs of different PM s/w stacks.

Secondly, you do not adress the cross-relationships between operation points
correctly. If you change the CPU frequency, you may have to switch other
(memory, video) settings; you might even have to validate the frequency
settings for these or even additional reasons (thermal and battery reasons -
ACPI _PPC).

This lowest layer basically assumes that upper-layer software has created an appropriate operating point (for example, in DPM we pretty much require a system designer to create operating points that match the h/w specs and don't go to great lengths to encode rules about this), and/or will call driver notifiers etc. as needed to adapt to the changes. Although there may be some sanity checking appropriate at the PowerOP level, cpufreq, DPM, etc. can for the most part continue to handle the larger issues of how valid operating points are constructed, driver callbacks, etc. If you do want to handle various dependencies at the PowerOP layer then there's nothing that prevents that, but PM frameworks tend to embody assumptions about how frequently operating points will change and in what contexts (interrupt, idle...), and this can influence the code for such things.

Thirdly, who is to decide on the power management settings? The first and
intuitive answer is the kernel. Therefore, kernel-space cpufreq governors
exist. Only under rare circumstances, you want full userspace control --
that's what the userspace cpufreq governor is for.

Also something left to the existing upper layers; PowerOP isn't intended to handle any of that. In the embedded space we usually let the system designer choose operating points supported by their h/w vendor and that match their particular system states (hardware enabled at any point in time, type and power/performance needs of software currently running). We do recommend that a userspace power policy manager be the component in charge of PM settings, based on messages from drivers and other apps on the state of the system. And so that userspace component activates the operating point (or set of operating points in the case of DPM) appropriate for current state.

Foruthly, the code duplication which your implementation leads to is obvious
for the speedstep-centrino case.

We could move the tables of valid cpu speeds and corresponding voltages down to the PowerOP level, and there would probably be little duplication at that point (in fact, with the current patch there's not a lot of duplication since the actual MSR access was moved to PowerOP and PowerOP contains little else, but both levels know how to understand the MSR format, and a more aggressive port to PowerOP could do away with that).

Your suggestions of changes to cpufreq governors and policies to handle governance of non-cpu-speed parameters sound interesting, and I'd be happy to help figure out what to do about those vs. the lower machine access layer I've discussed up until now. I'll think more about this real soon now. Thanks,

--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/