Re: Dynamic configure max_cstate

From: Len Brown
Date: Tue Jul 28 2009 - 15:47:25 EST


> When running a fio workload, I found sometimes cpu C state has
> big impact on the result. Mostly, fio is a disk I/O workload
> which doesn't spend much time with cpu, so cpu switch to C2/C3
> freqently and the latency is big.
>
> If I start kernel with idle=poll or processor.max_cstate=1,
> the result is quite good. Consider a scenario that machine is
> busy at daytime and free at night. Could we add a dynamic
> configuration interface for processor.max_cstate or something
> similiar with sysfs? So user applications could change the
> max_cstate dynamically? For example, we could add a new
> parameter to function cpuidle_governor->select to mark the
> highest c state.

max_cstate is a debug param. It isn't a run-time API and never will be.
User-space shouldn't need to know or care about C-states,
and if it appears it needs to, then we have a bug we need to fix.

The interface in Documentation/power/pm_qos_interface.txt
is supposed to handle this. Though if the underlying code
is not noticing IO interrupts, then it can't help.

Another thing to look at is processor.latency_factor
which you can change at run-time in
/sys/module/processor/parameters/latency_factor

We multiply the advertised exit latency by this
before deciding to enter a C-state. The concept
is that ACPI reports a performance number, but what
we really want is a power break-even. Anyway,
we know the default mulitple is too low, and will be
raising it shortly.

Of course if the current code is not predicting any
IO interrupts on your IO-only workload, this, like
pm_qos, will not help.

cheers,
-Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/