RE: [PATCH] x86: Export tsc related information in sysfs

From: Dan Magenheimer
Date: Sun May 16 2010 - 12:44:11 EST


> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> What we can talk about is a vget_tsc_raw() interface along with a
> vconvert_tsc_delta() interface, where vget_tsc_raw() returns you an
> nasty error code for everything which is not usable.

I'm open to something like that provided:

1) It works (whenever possible) without changing privilege levels
or causing vmexits or other "hidden slowness" problems when
used both in bare-metal Linux and in a virtual machine.
2) The "transformation" performed by the kernel on the TSC
does not require some hidden pcpu number that won't work
in a virtual machine.

If TSC is indeed reliable (see below), it is both faster AND
meets the above constraints.

> > From: Arjan van de Ven [mailto:arjan@xxxxxxxxxxxxx]
> > If you want a sysfs variable that is always 0... go wild.
>
> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> Nah, there are systems which will have it set to 1:
> Dig out your good old Pentium-I box and enjoy.

Hot stove syndrome again? Are you truly saying that there
are NO single-socket multi-core systems that don't have
stupid firmware (SMI and/or BIOS)? Or are you saying that
significant TSC clock skew occurs even between the cores
on a single-socket Nehalem system?

If things are this bad, why on earth would the kernel itself
EVER use TSC even as its own internal clocksource? Or
even to provide additional precision to a slow platform timer?

Or are you saying that many systems (and especially large
multi-socket systems) DO exist where the kernel isn't able
to proactively determine that the firmware is broken and/or
significant thermal variation may occur across sockets?
This I believe.

I understand that you both are involved in pushing the
limits of large systems and that time synchronization is
a very hard problem, perhaps effectively unsolvable,
in these systems.

But that doesn't mean the vast majority of latest generation
single-socket systems can't set "tsc_reliable" to 1. Or that
the kernel is responsible for detecting and/or correcting
every system with buggy firmware.

Maybe the best way to solve the "buggy firmware problem"
is exactly by encouraging enterprise apps to use TSC
and to expose and *blacklist* systems and/or system vendors
who ship boxes with crappy firmware!

> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> What we could expose is an estimate about the performance of
> gettimeofday/clock_gettime. The kernel has all the information to do
> that, but this still does not solve the notification problem when we
> need to switch to a different clock source.

This would at least be a big step in the right direction.

But if we go with a vget_raw_tsc() or direct TSC solution,
you have convinced me of the need for notification.
Maybe this is a perfect use for (at least one bit in)
the TSC_AUX register and the rdtscp instruction?

And I do agree with Venki that some user library (or at
least published sample code) should be made available
to demonstrate proper usage and to dampen out the worst
of the "broken user problem".

> > From: Arjan van de Ven [mailto:arjan@xxxxxxxxxxxxx]
> > can you name said "enterprise" software by name please? We need a huge
> > advertisement to let people know not to trust their important data to
> > it..

For obvious reasons I can't do that, but I can point to
enterprise *operating systems* that have long since solved
this same problem one way or another: Solaris on x86 and
HP-UX (the latter admittedly on ia64). Enterprise app
vendors are quite happy with requiring conformance to a
very completely specified software/hardware/firmware stack
before providing support to an app customer. I'm just trying
to ensure that Linux can be part of that spec.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/