Re: [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
From: Peter Zijlstra
Date: Thu Jul 30 2015 - 08:14:06 EST
On Thu, Jul 30, 2015 at 02:53:51PM +0300, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>
> > On Fri, Jul 17, 2015 at 04:34:09PM +0300, Alexander Shishkin wrote:
> >> Currently, the PT driver zeroes out the status register every time before
> >> starting the event. However, all the writable bits are already taken care
> >> of in pt_handle_status() function, except the new PacketByteCnt field,
> >> which in new versions of PT contains the number of packet bytes written
> >> since the last sync (PSB) packet. Zeroing it out before enabling PT forces
> >> a sync packet to be written. This means that, with the existing code, a
> >> sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
> >> time a PT event is scheduled in.
> >>
> >> To avoid these unnecessary syncs and save a WRMSR in the fast path, this
> >> patch adds a new attribute config bit "no_force_psb", which will disable
> >> this zeroing WRMSR.
> >
> > Why is this exposed?
>
> The default behavior, which we have in the kernel now, is to always
> force the PSB, so if we change it, we need to make sure nobody is
> relying on it being the default behavior.
Seeing how there's hardly any tools out there for this -- and its only
recently introduced to the kernel, I think we can change this behaviour.
> Granted, for the hardware
> that's on the market right now it won't make a difference (it generates
> PSB+ at least every 256 bytes anyway), but in future if we change the
> default, newer hardware will produce different results with 4.2 than with
> the newer kernels and this would be a tad sloppy.
Even more reason not to expose this. By the time there's hardware out
there that supports this, 4.2 will be an ancient kernel :-)
> The tools, however, can decide to set no_force_psb=1 when no_force_psb
> is available (sysfs caps) and otherwise the default behavior will be the
> same as no_force_psb==0. Makes sense?
I'd rather not expose this. It doesn't make a difference to the actual
trace data other than it being bigger. Any decoder must be able to
decode both versions. And you cannot really tell anything from there
being too many PSB frames.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/