Re: [RFC PATCH 1/3] Unified trace buffer

From: Jeremy Fitzhardinge
Date: Thu Sep 25 2008 - 18:40:10 EST


Linus Torvalds wrote:
> The reason I say "and no" is that it's not technically really possible to
> atomically give the exact TSC at which the frequency change took place. We
> just don't have the information, and I doubt we will ever have it.
>

Well, you don't need the tsc at the precise moment of the frequency
change. You just need to emit the current tsc+frequency+wallclock time
before you emit any more delta records after the frequency change. You
can't fetch all those values instantaneously, but you can get close.


> As such, there is no point in trying to make it a low-level special op,
> because we'd _still_ end up being totally equivalent with just doing as
> regular trace-event, with a regular TSC field, and then just fill the data
> field with the new frequency.
>
> But yes, I do think we'd need to have that as a trace packet type. I
> thought I even said so in my RFC for packet types. Ahh, it was in the
> follow-up:
>
>
>> I guess I should perhaps have put the TSC frequency in there in that "case
>> 2" thing too. Maybe that should be in "data" (in kHz) and tv_sec/tv_nsec
>> should be in array[0..1], and the time sync packet would be 24 bytes.
>>
>
> but yes, we obviously need the frequency in order to calculate some kind
> of wall-clock time (it doesn't _have_ to be in the same packet type as the
> thing that tries to sync with a real clock, but it makes sense for it to
> be there.
>

Yeah. If you ever mention wallclock time in the event stream, you have
to tie it to your local timebase (tsc+frequency) to make the whole thing
fit together.

> That said, if people think they can do a good job of ns conversion, I'll
> stop arguing. Quite frankly, I think people are wrong about that, and
> quite frankly, I think that anybody who looks even for one second at those
> "alternate" sched_clock() implementations should realize that they aren't
> suitable, but whatever. I'm not writing the code, I can only try to
> convince people to not add the insane call-chains we have now.

Yeah. Unfortunately, in the virtual case - unless you're virtualizing
the tsc itself, which is horrible - you can't really control or measure
how the tsc is going to behave, because its all under the hypervisor's
control. A "cpu" could be migrated between different physical cpus, the
whole machine could be migrated between hosts, or suspended, etc, making
it very hard to use the naked tsc. In that case the only real option is
to use a hypervisor-supplied timebase (which for Xen and KVM is a
tsc-based scheme exactly like we've been discussing, except the
hypervisor provides the tsc timing parameters).

asm/x86/kernel/pvclock.c does the tsc to ns conversion with just adds
and multiplies, but unfortunately it can't be expressed in C because it
uses the extra precision the x86 gives for multiplies.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/