Re: [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
From: Alexander Shishkin
Date: Thu Jul 30 2015 - 08:49:39 EST
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> On Thu, Jul 30, 2015 at 02:53:51PM +0300, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>>
>> > On Fri, Jul 17, 2015 at 04:34:09PM +0300, Alexander Shishkin wrote:
>> >> Currently, the PT driver zeroes out the status register every time before
>> >> starting the event. However, all the writable bits are already taken care
>> >> of in pt_handle_status() function, except the new PacketByteCnt field,
>> >> which in new versions of PT contains the number of packet bytes written
>> >> since the last sync (PSB) packet. Zeroing it out before enabling PT forces
>> >> a sync packet to be written. This means that, with the existing code, a
>> >> sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
>> >> time a PT event is scheduled in.
>> >>
>> >> To avoid these unnecessary syncs and save a WRMSR in the fast path, this
>> >> patch adds a new attribute config bit "no_force_psb", which will disable
>> >> this zeroing WRMSR.
>> >
>> > Why is this exposed?
>>
>> The default behavior, which we have in the kernel now, is to always
>> force the PSB, so if we change it, we need to make sure nobody is
>> relying on it being the default behavior.
>
> Seeing how there's hardly any tools out there for this -- and its only
> recently introduced to the kernel, I think we can change this
> behaviour.
Well, aside from perf tool itself (I'm not sure what's been happening to
the PT support there), there's gdb support, which has been merged to the
gdb mainline some time ago and makes use of our perf interface. I don't
know, however, if it relies on this or not.
>> Granted, for the hardware
>> that's on the market right now it won't make a difference (it generates
>> PSB+ at least every 256 bytes anyway), but in future if we change the
>> default, newer hardware will produce different results with 4.2 than with
>> the newer kernels and this would be a tad sloppy.
>
> Even more reason not to expose this. By the time there's hardware out
> there that supports this, 4.2 will be an ancient kernel :-)
>
>> The tools, however, can decide to set no_force_psb=1 when no_force_psb
>> is available (sysfs caps) and otherwise the default behavior will be the
>> same as no_force_psb==0. Makes sense?
>
> I'd rather not expose this. It doesn't make a difference to the actual
> trace data other than it being bigger. Any decoder must be able to
> decode both versions. And you cannot really tell anything from there
> being too many PSB frames.
It kinda does make a difference: inside PSB+ you get a TSC packet, which
carries the tsc timestamp as a payload, so, with the current behavior,
since you'll get one of those at every event schedule-in, you'll have
many more time synchronization points with the perf stream, but when the
default changes, suddenly your timestamps way further apart and the
decoder needs to put in additional guesswork to correlate those MMAP
events in the perf stream with the PT trace.
So normally, instead of forcing the PSB+, one would enable MTC (mini
time counter) packets, which would give better time reference points. If
one knows to do this, one doesn't need extra PSBs, if one doesn't, but
stuff sort of works for them because of the frequent PSBs, they will
break after the default changes. The decoders, ideally, should
eventually learn to deal with both cases, so in the long run it
shouldn't matter at all. I just typed in all that to clarify exactly
what the difference in trace data would mean.
Regards,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/