Re: [RFC] nohz: no update idle entry time on get_cpu_idle/iowait_time_us()

From: Thomas Gleixner
Date: Sun Oct 11 2015 - 14:39:18 EST


On Mon, 28 Sep 2015, Yunhong Jiang wrote:
> Currently the get_cpu_idle/iowait_time_us() updates the idle_entrytime.
> When it's invoked from another CPU and the target CPU has been on idle
> already, it will update the idle_entrytime to now, which is incorrect.
>
> However, the get_cpu_idle/iowait_time_us() is not guranteed to be called
> on the target CPU. For example, the get_cpu_idle_time_us() seems is
> invoked remotely on drivers/cpufreq/cpufreq_governor.c through
> get_cpu_idle_time().
>
> There is a check that the update happens only if a valid last_update_time
> parameter passed. IMHO, this is more a hack because there is no guarantee
> that it's invoked on the target CPU when last_update_time is valid.

Looking at the call sites, this last_update_time parameter is
silly. Why is the calling code not taking the timestamp? There is
hardly a requirement that this needs to be the same timestamp as the
one which is used to calculate idle time. That cpufreq calculations
are speculative anyway.

So we better get rid of that parameter completely.

> In fact, we don't need update the idle stats from
> get_cpu_idle/iowait_time_us(). Now the policy is, we record the
> entrytime when tick_nohz_start_idle() and update the stats
> when tick_nohz_stop_idle(). We calculate the stats on other situations.
>
> Please notice:
> 1) There is a bug currently that the tick_nohz_stop_idle() calls the
> update_ts_time_stats() and update the idle_entrytime, which is sure
> to be wrong. Removing the idle_entrytime update resolves the bug also.

Care to explain the actual bug and what wreckage it causes? If it's a
real bug then removing "ts->idle_entrytime = now" needs to be a
separate patch. AFAICT, it's a cosmetic issue.

> 2) There is a small widows in tick_nohz_stop_idle() between
> update_ts_time_stats() and clear ts->idle_active, that
> get_cpu_idle/iowait_time_us(), when invoked remotely, may double
> count last idle period. This window exists w/o this change and this
> change does not fix it.

Calling any of those functions from a remote cpu is broken to begin
with, especially on 32bit machines. And that does not change with your
patch at all.

What we really need here is protecting the idle stats fields with a
raw_spinlock.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/