Re: [PATCH RFC] hist lookups
From: David Miller
Date: Wed Nov 07 2018 - 01:13:54 EST
From: Jiri Olsa <jolsa@xxxxxxxxxx>
Date: Tue, 6 Nov 2018 21:42:55 +0100
> I pushed that fix in perf/fixes branch, but I'm still occasionaly
> hitting the namespace crash.. working on it ;-)
Jiri, how can this new scheme work without setting copy_on_queue
for the queued_events we use here?
I don't see copy_on_queue being set and that means the queued event
structures reference the event memory directly in the mmaps, after the
mmap thread has released them back to the queue.
That means new events can come in to the mmap ring and overwrite what
was there previously, maybe even while deliver_event() is in the
middle of parsing the event.
Setting copy_on_queue for data[0] and data[1] makes all of the crashes
go away for me.
I get a lot of "[unknown]" shared objects shortly after perf top
starts up during a full workload. I've been wondering about one
side effect of how the mmap queues are processed, consider the
following:
cpu 0 cpu 1
exec
create new mmap2 events
scheduled to cpu 0 for whatever reason
sample 1
sample 2
And let's say that perf top is backlogged processing the mmap ring of
events generated for cpu 0, and sees sample 1 and sample 2 before
getting to any of cpu 1's events.
This means the thread and map and symbol objects won't exist and
we'll get those '[Unknown]' histogram entries, and they won't go
away.
When it finally stops looping over the mmap ring for cpu 0's events
it gets to cpu 1's mmap ring and sees the exec and mmap2 events
but at that point it's far too late.
I surmise from what I see with perf top right now that this happens
a lot.