Re: [PATCH v5 00/17] Reference count checker and related fixes

From: Arnaldo Carvalho de Melo
Date: Thu Apr 06 2023 - 08:52:11 EST


Em Wed, Apr 05, 2023 at 07:25:27PM +0300, Adrian Hunter escreveu:
> On 5/04/23 16:20, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Apr 05, 2023 at 11:47:26AM +0300, Adrian Hunter escreveu:
> >> On 4/04/23 21:54, Arnaldo Carvalho de Melo wrote:
> >>> Em Tue, Apr 04, 2023 at 03:41:38PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>>> Em Tue, Apr 04, 2023 at 08:25:41PM +0300, Adrian Hunter escreveu:
> >>>>> I was wondering if the handling of dynamic data like struct map makes
> >>>>> any sense at present. Perhaps someone can reassure me.

> >>>>> A struct map can be updated when an MMAP event is processed. So it

> >>>> Yes, it can, and the update is made via a new PERF_RECORD_MMAP, right?

> >>>> So:

> >>>> perf_event__process_mmap()
> >>>> machine__process_mmap2_event()
> >>>> map__new() + thread__insert_map(thread, map)
> >>>> maps__fixup_overlappings()
> >>>> maps__insert(thread->maps, map);

> >>>> Ok, from this point on new samples on ] map->start .. map->end ] will
> >>>> grab a refcount to this new map in its hist_entry, right?

> >>>> When we want to sort by dso we will look at hist_entry->map->dso, etc.

> >>> And in 'perf top' we go decaying hist entries, when we delete the
> >>> hist_entry, drop the reference count to things it holds, that will then
> >>> be finally deleted when no more hist_entries point to it.

> >>>>> seems like anything racing with event processing is already broken, and
> >>>>> reference counting / locking cannot help - unless there is also
> >>>>> copy-on-write (which there isn't at present)?
> >
> >> So I checked, and struct map *is* copy-on-write in
> >> maps__fixup_overlappings(), so that should not be a problem.
> >
> >>>>> For struct maps, referencing it while simultaneously processing
> >>>>> events seems to make even less sense?
> >
> >>>> Can you elaborate some more?
> >
> >> Only that the maps are not necessarily stable e.g. the map that you
> >> need has been replaced in the meantime.
> >
> > Well, it may be sliced in several or shrunk by new ones overlapping it,
> > but it if completely disappears, say a new map starts before the one
> > disappearing and ends after it, then it remains with reference counts if
> > there are hist_entries (or other data structure) pointing to them,
> > right?

> >> But upon investigation, the only user at the moment is
> >> maps__find_ams(). If we kept the removed maps (we used to),
> >> it might be possible to make maps__find_ams() work correctly
> >> in any case.

> > Humm, I think I see what you mean, maps__find_ams() is called when we
> > are annotating a symbol, not when we're processing a sample, so it may
> > be the case that at the time of annotation the executable that is being
> > found (its parsing the target IP of a 'call' assembly instruction) was
> > replaced, is that the case?

> Yes, that is the possibility

Yeah, this one gets a bit more difficult to support, we would have to
keep a sub-bucket for each annotation instruction with the ordered by
timestamp list of maps that were on that location (but then just to
places that had samples, not to all) and then when add some visual cue
to that annotation line to mean that it was patched and show the
original, then the (possibly) various patches and say that samples up to
N units of time were for some original DSO, then to another (overlapping
executable map), then to some patching (that we would catch with
PERF_RECORD_TEXT_POKE for the kernel, right?), etc.

Seems doable, and for most cases would be similar to what we have right
now, as self-modifying code its not so pervasive (famous last words
;-)).

- Arnaldo