Re: [PATCH v2 00/16] Address some perf memory/data size issues

From: Ian Rogers
Date: Tue May 30 2023 - 10:45:28 EST

On Tue, May 30, 2023 at 12:59 AM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
> > BSS won't count toward file size, which the patches were primarily
> > going after - but checking the size numbers I have miscalculated from
> > reading size's output that I'm not familiar with. The numbers are
> > still improved, but I just see a 37kb saving, with 5kb more in
> > .rodata. Something but not much. is larger, which imo is
> > good, but those pages will still be dirtied so a mute point wrt file
> > size and memory overhead.
> The way perf is written (lots of separate code depending on a single high level
> switch) most pages probably won't be dirtied.

For data everything is relocated when perf is loaded. Setting a
breakpoint on main and then dumping smaps (edited for brevity) I see:
555555554000-5555555f8000 r--p 00000000 fe:01 32936368
Size: 656 kB
Pss: 656 kB
Pss_Dirty: 0 kB
5555555f8000-555555828000 r-xp 000a4000 fe:01 32936368
Size: 2240 kB
Pss: 32 kB
Pss_Dirty: 8 kB
555555828000-555555f23000 r--p 002d4000 fe:01 32936368
Size: 7148 kB
Pss: 64 kB
Pss_Dirty: 0 kB
555555f23000-555555f6d000 r--p 009cf000 fe:01 32936368
Size: 296 kB
Pss: 288 kB
Pss_Dirty: 288 kB
555555f6d000-555555f87000 rw-p 00a19000 fe:01 32936368
Size: 104 kB
Pss: 104 kB
Pss_Dirty: 104 kB
These are roughly header, text, .rodata,, .data. So at
the point we enter main we have 392kB of dirty pages in
and .data.

For x86 a large contributor to the relocations comes from the insn-x86.c test:
The test_data_32 and test_data_64 arrays are 75,024 bytes and 93,600
bytes respectively and are in, they account for nearly
40% of it.

In gdb at main entry:
(gdb) p test_data_32[0]
$1 = {data = "\017\061", '\000' <repeats 12 times>, expected_length =
2, expected_rel = 0,
expected_op_str = 0x555555866adc "", expected_branch_str = 0x555555866adc "",
asm_rep = 0x55555586fa2a "0f 31", ' ' <repeats 16 times>, "\trdtsc "}
you can see that all the strings in test_data_32 have been relocated
(even though we haven't run any part of perf yet) and are pointing to
data in .rodata. To avoid these relocations for the output of (pmu-events.c) all the strings are merged into a big string
and then the offsets within the string are stored - no relocations
means everything goes in the nice non-dirty .rodata. As the data in
the insn-x86.c test is also generated then a similar trick could be
performed. There is also the possibility to separate all the perf
builtins into libraries...


> >
> > For huge pages I thought it was correct that things are aligned by max
> > page size which I thought on x86-64 was 2MB, so I tried:
> > EXTRA_LDFLAGS="-z max-page-size=4096"
> > but it made no difference to anything, and with:
> > EXTRA_CFLAGS="-Wl,-z,max-page-size=4096"
> > EXTRA_CXXFLAGS="-Wl,-z,max-page-size=4096"
> > file size just got worse.
> The default alignment to 2MB was dropped in the GNU toolchain in 2018 or
> so.
> -Andi