Re: [PATCH RFC] hist lookups

From: David Miller
Date: Wed Oct 31 2018 - 12:08:22 EST


From: Jiri Olsa <jolsa@xxxxxxxxxx>
Date: Wed, 31 Oct 2018 16:39:07 +0100

> it'd be great to make hist processing faster, but is your main target here
> to get the load out of the reader thread, so we dont lose events during the
> hist processing?
>
> we could queue events directly from reader thread into another thread and
> keep it (the reader thread) free of processing, focusing only on event
> reading/passing

Indeed, we could create threads that take samples from the thread processing
the ring buffers, and insert them into the histogram.

In fact, since there is pthread locking already around the histogram
datastructures we could parallelize that as much as we want.

If beneficial we could also parallelize the ring buffer processing
into a small number of threads too.

My understanding is that in it's default mode perf gets one event ring
buffer per cpu being analyzed. So we could divide that number of
rings by some factor, like 16 or something, and thus divide the rings
into groups of 16 with one thread assigned to each group.

There is one major concern about this though. Creating threads makes
perf a bit more "invasive" to the workload it is observing. And that
is something we've always worked to minimize.

I think your idea to add threads for the histogram work is great.

But I still think that the histogram code is really bloated, and doing
a full 262 byte memset on every histogram lookup is unnecessary
overhead.