Re: [GIT PULL] First batch of KVM changes for 4.1

From: Andy Lutomirski
Date: Fri Apr 17 2015 - 17:43:03 EST


On Fri, Apr 17, 2015 at 2:28 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Apr 17, 2015 at 4:39 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>> On my box, vclock_gettime using kvm-clock is about 40 ns. An empty
>> syscall is about 33 ns. clock_gettime *should* be around 17 ns.
>>
>> The clock_gettime syscall is about 73 ns.
>>
>> Could we figure out why clock_gettime (the syscall) is so slow
>
> If we only could profile some random program (let's call it "a.out"
> that did the syscall(__NR_gettime_syscall) a couple million times.
>
> Oh, lookie here, Santa came around:
>
> 21.83% [k] system_call
> 12.85% [.] syscall
> 9.76% [k] __audit_syscall_exit
> 9.55% [k] copy_user_enhanced_fast_string
> 4.68% [k] __getnstimeofday64
> 4.08% [k] syscall_trace_enter_phase1
> 3.85% [k] __audit_syscall_entry
> 3.77% [k] unroll_tree_refs
> 3.15% [k] sys_clock_gettime
> 2.92% [k] int_very_careful
> 2.73% [.] main
> 2.35% [k] syscall_trace_leave
> 2.28% [k] read_tsc
> 1.73% [k] int_restore_rest
> 1.73% [k] int_with_check
> 1.48% [k] syscall_return
> 1.32% [k] dput
> 1.24% [k] system_call_fastpath
> 1.21% [k] syscall_return_via_sysret
> 1.21% [k] tracesys
> 0.81% [k] do_audit_syscall_entry
> 0.80% [k] current_kernel_time
> 0.73% [k] getnstimeofday64
> 0.68% [k] path_put
> 0.66% [k] posix_clock_realtime_get
> 0.61% [k] int_careful
> 0.60% [k] mntput
> 0.49% [k] kfree
> 0.36% [k] _copy_to_user
> 0.31% [k] int_ret_from_sys_call_irqs_off
>
> looks to me like it's spending a lot of time in system call auditing.
> Which makes no sense to me, since none of this should be triggering
> any auditing. And there's a lot of time in low-level kernel system
> call assembly code.

Muahaha. The auditors have invaded your system. (I did my little
benchmark with a more sensible configuration -- see way below).

Can you send the output of:

# auditctl -s
# auditctl -l

Are you, perchance, using Fedora? I lobbied rather heavily, and
successfully, to get the default configuration to stop auditing.
Unfortunately, the fix wasn't retroactive, so, unless you have a very
fresh install, you might want to apply the fix yourself:

https://fedorahosted.org/fesco/ticket/1311

>
> If I only remembered the name of the crazy person who said "Ok" when I
> suggest he just be the maintainer of the code since he has spent a lot
> of time sending patches for it. Something like Amdy Letorsky. No, that
> wasn't it. Hmm. It's on the tip of my tongue.
>
> Oh well. Maybe somebody can remember the guys name. It's familiar for
> some reason. Andy?

Amdy Lumirtowsky thinks he meant to attach a condition to his
maintainerish activities: he will do his best to keep the audit code
*out* of the low-level stuff, but he's going to try to avoid ever
touching the audit code itself, because if he ever had to change it,
he might accidentally delete the entire file.

Seriously, wasn't there a TAINT_PERFORMANCE thing proposed at some
point? I would love auditing to set some really loud global warning
that you've just done a Bad Thing (tm) performance-wise by enabling
it.

Back to timing. With kvm-clock, I see:

23.80% timing_test_64 [kernel.kallsyms] [k] pvclock_clocksource_read
15.57% timing_test_64 libc-2.20.so [.] syscall
12.39% timing_test_64 [kernel.kallsyms] [k] system_call
12.35% timing_test_64 [kernel.kallsyms] [k] copy_user_generic_string
10.95% timing_test_64 [kernel.kallsyms] [k] system_call_after_swapgs
7.35% timing_test_64 [kernel.kallsyms] [k] ktime_get_ts64
6.20% timing_test_64 [kernel.kallsyms] [k] sys_clock_gettime
3.62% timing_test_64 [kernel.kallsyms] [k] system_call_fastpath
2.08% timing_test_64 timing_test_64 [.] main
1.72% timing_test_64 timing_test_64 [.] syscall@plt
1.58% timing_test_64 [kernel.kallsyms] [k] kvm_clock_get_cycles
1.22% timing_test_64 [kernel.kallsyms] [k] _copy_to_user
0.65% timing_test_64 [kernel.kallsyms] [k] posix_ktime_get_ts
0.13% timing_test_64 [kernel.kallsyms] [k] apic_timer_interrupt

We've got some silly indirection, a uaccess that probably didn't get
optimized very well, and the terrifying function
pvclock_clocksource_read.

By comparison, using tsc:

19.51% timing_test_64 libc-2.20.so [.] syscall
15.52% timing_test_64 [kernel.kallsyms] [k] system_call
15.25% timing_test_64 [kernel.kallsyms] [k] copy_user_generic_string
14.34% timing_test_64 [kernel.kallsyms] [k] system_call_after_swapgs
8.66% timing_test_64 [kernel.kallsyms] [k] ktime_get_ts64
6.95% timing_test_64 [kernel.kallsyms] [k] sys_clock_gettime
5.93% timing_test_64 [kernel.kallsyms] [k] native_read_tsc
5.12% timing_test_64 [kernel.kallsyms] [k] system_call_fastpath
2.62% timing_test_64 timing_test_64 [.] main

That's better, although the uaccess silliness is still there.

(No PEBS -- I don't think it works right in KVM.)

--Andy

>
> Linus



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/