Re: [PATCH v1 0/3] Fix and improve __maps__fixup_overlap_and_insert

From: Ian Rogers
Date: Fri Jun 07 2024 - 00:55:23 EST


On Thu, Jun 6, 2024 at 3:56 AM James Clark <james.clark@xxxxxxx> wrote:
>
>
>
> On 21/05/2024 17:51, Ian Rogers wrote:
> > Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
> >
> > Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> > of overlapping mmaps. sesse@xxxxxxxxxx reported slowness opening
> > perf.data files from chromium where the files contained a large number
> > of overlapping mappings. Improve this case primarily by avoiding
> > unnecessary sorting.
> >
> > Unscientific timing data processing a perf.data file with overlapping
> > mmap events from chromium:
> >
> > Before:
> > real 0m9.856s
> > user 0m9.637s
> > sys 0m0.204s
> >
> > After:
> > real 0m0.675s
> > user 0m0.454s
> > sys 0m0.196s
> >
> > Tested with address/leak sanitizer, invariant checks and validating
> > the before and after output are identical.
> >
> > Ian Rogers (3):
> > perf maps: Fix use after free in __maps__fixup_overlap_and_insert
> > perf maps: Reduce sorting for overlapping mappings
> > perf maps: Add/use a sorted insert for fixup overlap and insert
> >
> > tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
> > 1 file changed, 92 insertions(+), 21 deletions(-)
> >
>
> Reviewed-by: James Clark <james.clark@xxxxxxx>
>
> I'm wondering if there is any point in the non sorted insert any more?
>
> Maps could be always either sorted by name or sorted by address and
> insert() is always a sorted/fixup-overlaps insert depending on the sort
> style of the list.

One of the motivations for the sorted array, instead of an rbtree, was
to simplify reference count checking. We're in much better shape with
that these days, I think the worst "memory leak" is the dsos holding
onto a dso and its symbols indefinitely, instead of removing older
unused dsos. It'd be hard to go back to an rb tree with reference
counting.

For the rbtree insert it was O(log N), the sorted insert these changes
add is O(N) and the regular insert without sorting is O(1). A common
case from scanning /proc/pid/maps is that when map entries are
inserted they are inserted in order. The O(1) insert checks that and
maintains that the maps are sorted.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/maps.c?h=perf-tools-next#n469
For the synthesized maps we should be able to insert N map entries in
O(N). The sorted insert would be similar as the memmoves would always
copy 0 elements. So I can see an argument for keeping the array always
sorted. For the perf.data files we commonly process synthesized mmaps
dominate. In the case for this patch, JIT data dominated, with
frequent overlapping mapping changes. I guess the biggest hurdle to
just always keeping things sorted is the mental hurdle of ignoring the
worst insert complexity, which should never apply given how the maps
are commonly built.

Thanks,
Ian