Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64

From: Petr Tesarik
Date: Tue Nov 12 2024 - 06:35:48 EST

Next message: Alexandre Ghiti: "[PATCH -fixes] drivers: perf: Fix wrong put_cpu() placement"
Previous message: Jeff Layton: "Re: [PATCH v3 0/2] fs: allow statmount to fetch the subtype and devname"
In reply to: Ryan Roberts: "Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64"
Next in thread: Petr Tesarik: "Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 12 Nov 2024 10:19:34 +0000
Ryan Roberts <ryan.roberts@xxxxxxx> wrote:

> On 12/11/2024 09:45, Petr Tesarik wrote:
> > On Mon, 11 Nov 2024 12:25:35 +0000
> > Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >
> >> Hi Petr,
> >>
> >> On 11/11/2024 12:14, Petr Tesarik wrote:
> >>> Hi Ryan,
> >>>
> >>> On Thu, 17 Oct 2024 13:32:43 +0100
> >>> Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
> >> [...]
> >>> Third, a few micro-benchmarks saw a significant regression.
> >>>
> >>> Most notably, getenv and getenvT2 tests from libMicro were 18% and 20%
> >>> slower with variable page size. I don't know why, but I'm looking into
> >>> it. The system() library call was also about 18% slower, but that might
> >>> be related.
> >>
> >> OK, ouch. I think there are some things we can try to optimize the
> >> implementation further. But I'll wait for your analysis before digging myself.
> >
> > This turned out to be a false positive. The way this microbenchmark was
> > invoked did not get enough samples, so it was mostly dependent on
> > whether caches were hot or cold, and the timing on this specific system
> > with the specific sequence of bencnmarks in the suite happens to favour
> > my baseline kernel.
> >
> > After increasing the batch count, I'm getting pretty much the same
> > performance for 6.11 vanilla and patched kernels:
> >
> > prc thr usecs/call samples errors cnt/samp
> > getenv (baseline) 1 1 0.14975 99 0 100000
> > getenv (patched) 1 1 0.14981 92 0 100000
>
> Oh that's good news! Does this account for all 3 of the above tests (getenv,
> getenvT2 and system())?

It does for getenvT2 (a variant of the test with 2 threads), but not
for system. Thanks for asking, I forgot about that one.

I'm getting substantial difference there (+29% on average over 100 runs):

prc thr usecs/call samples errors cnt/samp command
system (baseline) 1 1 6937.18016 102 0 100 A=$$
system (patched) 1 1 8959.48032 102 0 100 A=$$

So, yeah, this should in fact be my priority #1.

The "system" benchmark measures the duration of system("A=$$"), which
involves starting the system shell (in my case bash-4.4.23), so this is
not really a microbenchmark. I hope perf can help match the difference
to a kernel API.

Petr T

Next message: Alexandre Ghiti: "[PATCH -fixes] drivers: perf: Fix wrong put_cpu() placement"
Previous message: Jeff Layton: "Re: [PATCH v3 0/2] fs: allow statmount to fetch the subtype and devname"
In reply to: Ryan Roberts: "Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64"
Next in thread: Petr Tesarik: "Re: [RFC PATCH v1 00/57] Boot-time page size selection for arm64"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]