Re: [PATCH 2/2] Add a thread cpu time implementation to vDSO

From: Arun Sharma
Date: Mon Dec 12 2011 - 17:50:09 EST


On 12/12/11 12:15 PM, Andrew Lutomirski wrote:

This seems like a neat idea, but I'm a little worried about the
implementation. Surely someone with 4k+ cpus will complain when the
percpu vvar data exceeds a page and crashes. I have no personal
objection to a dynamically-sized vvar area, but it might need more
careful handling.

I'm for disabling the vsyscall when the data doesn't fit in a couple of pages.


I also wonder if there a problem with information leaking. If there
was an entire per-cpu vvar page (perhaps mapped differently on each
cpu) then nothing interesting would leak. But that could add
overhead.


The timing based attacks depend on the granularity of timestamps. I feel what's available here is too coarse grained to be useful. Happy to
disable the code at compile time for those cases. Are there CONFIG_HIGH_SECURITY type of options I could use for this purpose?


diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 0fd7a4a..e36e1c1 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -47,5 +47,6 @@
DECLARE_VVAR(0, volatile unsigned long, jiffies)
DECLARE_VVAR(16, int, vgetcpu_mode)
DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data)
+DECLARE_VVAR(2048, struct vcpu_data, vcpu_data)

Any reason this isn't page-aligned? Offset 2048 seems like an odd place.


I meant it to be 128 + sizeof(struct vsyscall_gtod_data) so we fit everything in one page if CONFIG_NR_CPUS is small enough.

I'll fix it up in the next rev.

notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
{
long ret;
- asm("syscall" : "=a" (ret) :
- "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
+ asm volatile("syscall" : "=a" (ret) :
+ "0" (__NR_clock_gettime),"D" (clock), "S" (ts) : "memory");
return ret;
}

Huh? This should probably be a separate patch (and probably not a
-stable candidate, since it would take amazing compiler stupidity to
generate anything other than the obvious code). The memory clobber
may also be enough to make this officially safe.

Yes - this should be a separate patch. gcc-4.4 likes to get rid of the instruction in __do_thread_cpu_time without the asm volatile (in spite of the memory clobber).


+ if (vp->tsc_unstable) {
+ struct timespec ts;
+ vdso_fallback_gettime(CLOCK_THREAD_CPUTIME_ID,&ts);
+ return timespec_to_ns(&ts);
+ }

Yuck -- another flag indicating whether we're using the tsc.

I renamed it to thread_cputime_disabled to deal with NR_CPUS > 64.


+
+ do {
+ native_read_tscp(&p);

Do all 64-bit cpus have rdtscp? ISTR older Intel machines don't have it.


Yeah - I ran into this one when testing with KVM on a rdtscp capable CPU. Will disable the vsyscall when the CPU doesn't support rdtscp.

--- a/arch/x86/vdso/vdso.lds.S
+++ b/arch/x86/vdso/vdso.lds.S
@@ -25,6 +25,8 @@ VERSION {
__vdso_getcpu;
time;
__vdso_time;
+ thread_cpu_time;
+ __vdso_thread_cpu_time;
local: *;
};
}

Why do we have the non-__vdso versions? IMO anything that actually
uses them is likely to be confused. They have the same names as the
glibc wrappers, but the glibc wrappers have different return value and
errno semantics. I'd say just use __vdso.

I'm not familiar with the history of __vdso_foo vs foo. Happy to get rid of the non __vdso versions.


Also, what's wrong with just adding the functionality to __vdso_clock_gettime?


Performance. Most of this is client dependent (whether it expects time in nanoseconds or timespec).

Baseline: 19.34 secs
vclock_gettime: 4.74 secs
thread_cpu_time: 3.62 secs

Our use case prefers nanoseconds, so the conversion to timespec and back adds overhead. Also, having a direct entry point vs going through the switch in clock_gettime() adds some cycles.

I have some people here asking me if they could get CLOCK_REALTIME and CLOCK_MONOTONIC also in nanoseconds for the same reason.


+struct vcpu_data {
+ struct vpercpu_data vpercpu[NR_CPUS];
+ unsigned int tsc_khz;

I think this also duplicates some of the gtod_data stuff, at least in
the case that TSC works for gtod.


vgotd.h seems to be clock source independent. This vsyscall is usable only on systems with tsc_unstable == 0. Also, there is only one instance of mult and shift there, whereas we save it on a per-cpu basis in the VVAR page.

Thanks for the review and comments!

-Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/