Re: [PATCH 1/1] sched/cputime: Mitigate performance regression in times()/clock_gettime()

From: Ingo Molnar
Date: Wed Aug 10 2016 - 14:32:37 EST



* Giovanni Gherdovich <ggherdovich@xxxxxxx> wrote:

> Commit 6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime()
> inconsistency") fixed a problem whereby clock_nanosleep() followed by
> clock_gettime() could allow a task to wake early. It addressed the problem
> by calling the scheduling classes update_curr when the cputimer starts.
>
> Said change induced a considerable performance regression on the syscalls
> times() and clock_gettimes(CLOCK_PROCESS_CPUTIME_ID). There are some
> debuggers and applications that monitor their own performance that
> accidentally depend on the performance of these specific calls.
>
> This patch mitigates the performace loss by prefetching data in the CPU
> cache, as stalls due to cache misses appear to be where most time is spent
> in our benchmarks.
>
> Here are the performance gain of this patch over v4.7-rc7 on a Sandy Bridge
> box with 32 logical cores and 2 NUMA nodes. The test is repeated with a
> variable number of threads, from 2 to 4*num_cpus; the results are in
> seconds and correspond to the average of 10 runs; the percentage gain is
> computed with (before-after)/before so a positive value is an improvement
> (it's faster). The improvement varies between a few percents for 5-20
> threads and more than 10% for 2 or >20 threads.
>
> pound_clock_gettime:
>
> threads 4.7-rc7 patched 4.7-rc7
> [num] [secs] [secs (percent)]
> 2 3.48 3.06 ( 11.83%)
> 5 3.33 3.25 ( 2.40%)
> 8 3.37 3.26 ( 3.30%)
> 12 3.32 3.37 ( -1.60%)
> 21 4.01 3.90 ( 2.74%)
> 30 3.63 3.36 ( 7.41%)
> 48 3.71 3.11 ( 16.27%)
> 79 3.75 3.16 ( 15.74%)
> 110 3.81 3.25 ( 14.80%)
> 128 3.88 3.31 ( 14.76%)

Nice detective work! I'm wondering, where do we stand if compared with a
pre-6e998916dfe3 kernel?

I admit this is a difficult question: 6e998916dfe3 does not revert cleanly and I
suspect v3.17 does not run easily on a recent distro. Could you attempt to revert
the bad effects of 6e998916dfe3 perhaps, just to get numbers - i.e. don't try to
make the result correct, just see what the performance gap is, roughly.

If there's still a significant gap then it might make sense to optimize this some
more.

Thanks,

Ingo