Re: [PATCH 1/3] sched/cputime: Improve scalability of times()/clock_gettime() on 32 bit cpus

From: Peter Zijlstra
Date: Thu Sep 01 2016 - 06:29:37 EST


On Thu, Sep 01, 2016 at 12:07:34PM +0200, Stanislaw Gruszka wrote:
> On Thu, Sep 01, 2016 at 11:49:06AM +0200, Peter Zijlstra wrote:
> > You're now making rather hot paths slower to benefit a rather slow path,
> > that too is backwards.
>
> Ok, you have right, I made update_curr() slower (a bit I think, since
> this new seqcount primitive should be in the same cache line as other
> things).

seqcount adds 2 smp_wmb(), which on ARM, are not free (it is possible to
do with just 1 FWIW).

> But do we don't care about inconsistency of accessing of 64 bit variable
> on 32 bit processors (see patch 3) ? I know this is unlikely scenario
> to get inconsistency, but I assume it's still possible, or not?

Its actually quite possible. We've observed it a fair few times. 64bit
variables are 2 32bit stores/loads and getting interleaved data is quite
possible.

> If not, I can get rid of read_sum_exec_runtime() and just read
> sum_exec_runtime without task_rq_lock() protection on
> thread_group_cputime() . That would make the benchmark happy.

I think this benchmark is misguided. Just accept that O(nr_threads) is
expensive, same with process wide itimer, just don't use them when you
care about performance.