Re: [PATCH v2] cpuidle: Add 'above' and 'below' idle state metrics

From: Peter Zijlstra
Date: Mon Dec 10 2018 - 07:21:24 EST


On Mon, Dec 10, 2018 at 12:30:23PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> Add two new metrics for CPU idle states, "above" and "below", to count
> the number of times the given state had been asked for (or entered
> from the kernel's perspective), but the observed idle duration turned
> out to be too short or too long for it (respectively).
>
> These metrics help to estimate the quality of the CPU idle governor
> in use.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

> @@ -260,6 +262,33 @@ int cpuidle_enter_state(struct cpuidle_d
> dev->last_residency = (int)diff;
> dev->states_usage[entered_state].time += dev->last_residency;
> dev->states_usage[entered_state].usage++;
> +
> + if (diff < drv->states[entered_state].target_residency) {
> + for (i = entered_state - 1; i >= 0; i--) {
> + if (drv->states[i].disabled ||
> + dev->states_usage[i].disable)
> + continue;
> +
> + /* Shallower states are enabled, so update. */
> + dev->states_usage[entered_state].above++;
> + break;
> + }
> + } else if (diff > delay) {
> + for (i = entered_state + 1; i < drv->state_count; i++) {
> + if (drv->states[i].disabled ||
> + dev->states_usage[i].disable)
> + continue;
> +
> + /*
> + * Update if a deeper state would have been a
> + * better match for the observed idle duration.
> + */
> + if (diff - delay >= drv->states[i].target_residency)
> + dev->states_usage[entered_state].below++;
> +
> + break;
> + }
> + }

One question on this; why is this tracked unconditionally?

Would not a tracepoint be better?; then there is no overhead in the
normal case where nobody gives a crap about these here numbers.