Re: [PATCH RFC] hist lookups

From: Jiri Olsa
Date: Sun Nov 04 2018 - 15:18:28 EST


On Fri, Nov 02, 2018 at 11:30:03PM -0700, David Miller wrote:
> From: David Miller <davem@xxxxxxxxxxxxx>
> Date: Wed, 31 Oct 2018 09:08:16 -0700 (PDT)
>
> > From: Jiri Olsa <jolsa@xxxxxxxxxx>
> > Date: Wed, 31 Oct 2018 16:39:07 +0100
> >
> >> it'd be great to make hist processing faster, but is your main target here
> >> to get the load out of the reader thread, so we dont lose events during the
> >> hist processing?
> >>
> >> we could queue events directly from reader thread into another thread and
> >> keep it (the reader thread) free of processing, focusing only on event
> >> reading/passing
> >
> > Indeed, we could create threads that take samples from the thread processing
> > the ring buffers, and insert them into the histogram.
>
> So I played around with some ideas like this and ran into some dead ends.
>
> I ran each mmap ring's processing in a separate thread.
>
> This doesn't help at all, the problem is that all the threads serialize
> at the pthread lock for the histogram part of the work.
>
> And the histogram part dominates the cost of processing each sample.

yep, it suck.. I was thinking of keeping separate hist objects for
each thread and merge them at the end

>
> Nevertheless I started work on formally threading all of the code that
> the mmap threads operate on, such as symbol processing etc. and while
> doing so I came to the conclusion that pushing the histogram processing
> only to a separate thread poses it's own set of big challenges.
>
> To make this work we would have to make a piece of transient on-stack
> state (the processed event) into allocated persistent state.
>
> These persistent event structures get queued up to the histogram
> thread(s).
>
> Therefore, if the histogram thread(s) can't keep up (and as per my
> experiment above, it is easy to enter this state because the histogram
> code itself is going to run linearly with the histgram lock held),
> this persistent event memory will just get larger and larger.
>
> We would have to find some way to parallelize the histgram code to
> make any kind of threading worthwhile.

do you have some code I could check on?

I'm going to make that separate thread to get the processing out
of the reading thread.. I think we need that in any case, so the
ring buffer is kept free as fast as possible

thanks,
jirka