Re: PowerOP 0/3: System power operating point management API

From: Dominik Brodowski
Date: Tue Aug 16 2005 - 06:32:32 EST


Hi!

The PowerOP infrastructure you suggest surely is one path to better runtime
power management in the Linux kernel. However, I don't like it at all in its
current implementation. Here are a few suggestions for improvements,
rewrites, and so on:

First, the table interface you suggest is ugly. If there's indeed the need for
such an abstraction, I'd favour something like

struct powerop {
struct list_head powerop_values; /* linked list of powerop_values */
...
}

struct powerop_value {
unsigned long value_cur;
unsigned long value_min;
unsigned long value_max;
struct list_head next;
u16 type;
struct powerop_value *cross_dependency;
struct powerop_driver *driver;
}

#define POWEROP_TYPE_CPU_FREQUENCY 0x00000001
#define POWEROP_TYPE_CPU_VOLTAGE 0x00000002
#define POWEROP_TYPE_FRONT_SIDE_BUS_SPEED 0x00000004
...
#define POWEROP_TYPE_GPU_FREQUENCY 0x00010000
...

and if CPU_VOLTAGE and CPU_FREQEUNCY can only be modified at the same time, (as
most cpufreq drivers require), type is 0x00000003.


Secondly, you do not adress the cross-relationships between operation points
correctly. If you change the CPU frequency, you may have to switch other
(memory, video) settings; you might even have to validate the frequency
settings for these or even additional reasons (thermal and battery reasons -
ACPI _PPC).

Thirdly, who is to decide on the power management settings? The first and
intuitive answer is the kernel. Therefore, kernel-space cpufreq governors
exist. Only under rare circumstances, you want full userspace control --
that's what the userspace cpufreq governor is for.

Foruthly, the code duplication which your implementation leads to is obvious
for the speedstep-centrino case. And in contrast to Pavel, I do not consider
it a "tiny cleanup".



I'd suggest that you try upgrading the cpufreq infrastructure to provide
full support for multiple types of POWEROPs:

a) Setting of "policies"
- New "min" or "max" values for all powerop_values are set, verified
by powerop lowlevel drivers, powerop governors and external
notifiers. E.g. if a new frequency min/max pair is required, the
voltage level gets a new min and max value as well --> you need to
handle recursion.
- If necessary a new "powerop governor" is started.
- Each powerop governor specifies which POWEROPs it can handle
- current cpufreq governors can handle CPU_FREQUENCY,
CPU_VOLTAGE and FRONT_SIDE_BUS_SPEED
- an userspace fallback-governor always "handles" the
parameters no other governor handles

b) Setting of "values"
- Each governor can initiate transitions between the "min" and "max"
values for operationg points it aquired ownership for.
- The new setting is notified to all other governors and to external
notifiers. If some entitiy decides it cannot live well with this
new setting, it breaks out. Note that this should not happen quite
often, as the "normal" verification takes place in a) above.
Nonetheless, if you want to break out CPU_VOLTAGE and CPU_FREQUENCY, you
need it. And as it makes life for the kernel so much more
difficult, I'm against doing so.
- The low-level driver handling the powerop_value is called

Thanks,
Dominik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/