Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads

From: Peter Zijlstra
Date: Tue Sep 11 2018 - 10:19:16 EST


On Tue, Sep 11, 2018 at 08:35:12AM +0200, Ingo Molnar wrote:
> > Well, explicit threading in the tool for AIO, in the simplest case, means
> > incorporating some POSIX API implementation into the tool, avoiding
> > code reuse in the first place. That tends to be error prone and costly.
>
> It's a core competency, we better do it right and not outsource it.
>
> Please take a look at Jiri's patches (once he re-posts them), I think it's a very good
> starting point.

There's another reason for doing custom per-cpu threads; it avoids
bouncing the buffer memory around the machine. If the task doing the
buffer reads is the exact same as the one doing the writes, there's less
memory traffic on the interconnects.

Also, I think we can avoid the MFENCE in that case, but I'm not sure
that one is hot enough to bother about on the perf reading side of
things.