[PATCH v4 0/7] dso/dsos memory savings and clean up

From: Ian Rogers
Date: Tue Apr 23 2024 - 00:26:33 EST


7 more patches from:
https://lore.kernel.org/lkml/20240202061532.1939474-1-irogers@xxxxxxxxxx/
a near half year old adventure in trying to lower perf's dynamic
memory use. Bits like the memory overhead of opendir are on the
sidelines for now, too much fighting over how
distributions/C-libraries present getdents. These changes are more
good old fashioned replace an rb-tree with a sorted array and add
reference count tracking.

The changes migrate dsos code, the collection of dso structs, more
into the dsos.c/dsos.h files. As with maps and threads, this is done
so the internals can be changed - replacing a linked list (for fast
iteration) and an rb-tree (for fast finds) with a lazily sorted
array. The complexity of operations remain roughly the same, although
iterating an array is likely faster than iterating a linked list, the
memory usage is at least reduce by half.

As fixing the memory usage necessitates changing operations like find,
modify these operations so that they increment the reference count to
avoid races like a find in dsos and a remove. Similarly tighten up
lock usage so that operations working on dsos state hold the
appropriate lock.

Here are some questions (with answers) that I am expecting from reviewers:

- Why not refactor dso with accessors first and then do the other things?

My ambition with this change was to lower memory overhead not to
particularly clean up and fix dso. Fixing the memory overhead, by
refactoring and changing the internals, showed that locking discipline
and reference counting discipline was lacking. The later changes try
to fix these as a service to the community while I am changing the
code and to also ensure that code is correct (more correct than it was
wrt locking and reference counting than before the patches).

Reordering the patches to do the refactoring first will be a giant
pain. It will merge conflict with every other patch in the series and
is basically a request to reimplement everything from square 1. The
only thing I'd have in my favor would be how the code should look at
the end of the series, and reordering patches doesn't change the
eventual outcome of applying the patches. Note also, were I to send
the memory saving patches and then a week later send the API clean up
and reference counting fix patches the patches would be merged in the
order they are here. I've done my best, I know you may consider that
I'm adding to your reviewing overhead but I've also got to think about
the overhead to me.

- Please break apart this change...

The first changes are moving things, but when a broken API is spotted
like the missing get on dsos__find I put it in a change to move the
function and to add the missed get. Could this be two changes? Yes, it
could. Does moving code materially change the behavior of the tool?
No. I've done it in one patch to minimize churn and to some extent for
my sanity. Such changes are less than 100 lines of code and all
independently tested.

- The logic in dso around short, long name and id with sorting is weird

Yes, I've tried to make it less weird while retaining the existing
behavior. It would be easy to make a series of patches just cleaning
it up but I came here to save memory not change the dso API.

- Move the fixes in the 12th patch earlier.

This is possible but then impossible to test with reference count
checking. This does mean there are broken reference counts before the
patch is applied, but this is generally already the case. Yes, some
hypothetical person may decide to fork midway through this patch
series and my order would mean they wouldn't have a fix. I've done my
best while working within the bounds of my time and trying to avoid
churn.

v4. Rebases, dealing with merge conflicts, on tmp.perf-tools-next
dropping the merged:
perf dsos: Attempt to better abstract dsos internals
perf dsos: Tidy reference counting and locking
perf dsos: Add dsos__for_each_dso
perf dso: Move dso functions out of dsos
perf dsos: Switch more loops to dsos__for_each_dso

v3. Rebases and drops merged "perf dso: Reorder variables to save
space in struct dso"

v2. Rebases on top of tmp.perf-tools-next resolving merge conflicts.

Ian Rogers (7):
perf dsos: Switch backing storage to array from rbtree/list
perf dsos: Remove __dsos__addnew
perf dsos: Remove __dsos__findnew_link_by_longname_id
perf dsos: Switch hand code to bsearch
perf dso: Add reference count checking and accessor functions
perf dso: Reference counting related fixes
perf dso: Use container_of to avoid a pointer in dso_data

tools/perf/builtin-annotate.c | 6 +-
tools/perf/builtin-buildid-cache.c | 2 +-
tools/perf/builtin-buildid-list.c | 18 +-
tools/perf/builtin-inject.c | 71 ++-
tools/perf/builtin-kallsyms.c | 2 +-
tools/perf/builtin-mem.c | 4 +-
tools/perf/builtin-report.c | 6 +-
tools/perf/builtin-script.c | 8 +-
tools/perf/builtin-top.c | 4 +-
tools/perf/builtin-trace.c | 2 +-
tools/perf/tests/code-reading.c | 8 +-
tools/perf/tests/dso-data.c | 67 ++-
tools/perf/tests/hists_common.c | 6 +-
tools/perf/tests/hists_cumulate.c | 4 +-
tools/perf/tests/hists_output.c | 2 +-
tools/perf/tests/maps.c | 4 +-
tools/perf/tests/symbols.c | 8 +-
tools/perf/tests/vmlinux-kallsyms.c | 6 +-
tools/perf/ui/browsers/annotate.c | 6 +-
tools/perf/ui/browsers/hists.c | 8 +-
tools/perf/ui/browsers/map.c | 4 +-
tools/perf/util/annotate-data.c | 16 +-
tools/perf/util/annotate.c | 17 +-
tools/perf/util/auxtrace.c | 2 +-
tools/perf/util/block-info.c | 2 +-
tools/perf/util/bpf-event.c | 8 +-
tools/perf/util/build-id.c | 38 +-
tools/perf/util/callchain.c | 2 +-
tools/perf/util/data-convert-json.c | 2 +-
tools/perf/util/db-export.c | 6 +-
tools/perf/util/disasm.c | 36 +-
tools/perf/util/dlfilter.c | 12 +-
tools/perf/util/dso.c | 429 ++++++++-------
tools/perf/util/dso.h | 500 ++++++++++++++++--
tools/perf/util/dsos.c | 286 +++++-----
tools/perf/util/dsos.h | 18 +-
tools/perf/util/event.c | 8 +-
tools/perf/util/header.c | 8 +-
tools/perf/util/hist.c | 4 +-
tools/perf/util/intel-pt.c | 22 +-
tools/perf/util/machine.c | 50 +-
tools/perf/util/map.c | 78 +--
tools/perf/util/maps.c | 14 +-
tools/perf/util/print_insn.c | 2 +-
tools/perf/util/probe-event.c | 25 +-
.../util/scripting-engines/trace-event-perl.c | 6 +-
.../scripting-engines/trace-event-python.c | 21 +-
tools/perf/util/sort.c | 19 +-
tools/perf/util/srcline.c | 65 +--
tools/perf/util/symbol-elf.c | 145 +++--
tools/perf/util/symbol-minimal.c | 4 +-
tools/perf/util/symbol.c | 186 +++----
tools/perf/util/symbol_fprintf.c | 4 +-
tools/perf/util/synthetic-events.c | 24 +-
tools/perf/util/thread.c | 4 +-
tools/perf/util/unwind-libunwind-local.c | 18 +-
tools/perf/util/unwind-libunwind.c | 2 +-
tools/perf/util/vdso.c | 8 +-
58 files changed, 1407 insertions(+), 930 deletions(-)

--
2.44.0.769.g3c40516874-goog