Re: [tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
From: Frederic Weisbecker
Date: Tue Mar 01 2016 - 10:35:13 EST
2016-02-29 16:31 GMT+01:00 Frederic Weisbecker <fweisbec@xxxxxxxxx>:
> 2016-02-29 12:18 GMT+01:00 tip-bot for Rik van Riel <tipbot@xxxxxxxxx>:
>> Commit-ID: ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
>> Gitweb: http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
>> Author: Rik van Riel <riel@xxxxxxxxxx>
>> AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
>> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
>> CommitDate: Mon, 29 Feb 2016 09:53:10 +0100
>>
>> sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
>>
>> When profiling syscall overhead on nohz-full kernels,
>> after removing __acct_update_integrals() from the profile,
>> native_sched_clock() remains as the top CPU user. This can be
>> reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
>>
>> This will reduce timing accuracy on nohz_full CPUs to jiffy
>> based sampling, just like on normal CPUs. It results in
>> totally removing native_sched_clock from the profile, and
>> significantly speeding up the syscall entry and exit path,
>> as well as irq entry and exit, and KVM guest entry & exit.
>>
>> Additionally, only call the more expensive functions (and
>> advance the seqlock) when jiffies actually changed.
>>
>> This code relies on another CPU advancing jiffies when the
>> system is busy. On a nohz_full system, this is done by a
>> housekeeping CPU.
>>
>> A microbenchmark calling an invalid syscall number 10 million
>> times in a row speeds up an additional 30% over the numbers
>> with just the previous patches, for a total speedup of about
>> 40% over 4.4 and 4.5-rc1.
>>
>> Run times for the microbenchmark:
>>
>> 4.4 3.8 seconds
>> 4.5-rc1 3.7 seconds
>> 4.5-rc1 + first patch 3.3 seconds
>> 4.5-rc1 + first 3 patches 3.1 seconds
>> 4.5-rc1 + all patches 2.3 seconds
>>
>> A non-NOHZ_FULL cpu (not the housekeeping CPU):
>>
>> all kernels 1.86 seconds
>>
>> Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>> Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Mike Galbraith <efault@xxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: clark@xxxxxxxxxx
>> Cc: eric.dumazet@xxxxxxxxx
>> Cc: fweisbec@xxxxxxxxx
>
> It seems the tip bot doesn't parse correctly the Cc tags as I wasn't
> cc'ed on this commit.
>
> Also I wish I had a chance to test and ack this patch before it got
> applied. I guess I should have told I was in vacation for the last
> weeks.
>
> I'm going to run it through some tests.
Ok I did some simple tests (kernel loops, user loops) and it seems to
account the cputime accurately. The kernel loop consists in brk()
calls (borrowed from an old test from Steve) and it also accounted the
small fragments of user time spent between syscall calls.
So it looks like a very nice improvement, thanks a lot Rik!