Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator

From: Mel Gorman
Date: Mon Mar 22 2021 - 12:46:01 EST

Next message: Peter Zijlstra: "Re: [PATCH RFC v2 8/8] selftests/perf: Add kselftest for remove_on_exec"
Previous message: Fabrice Gasnier: "Re: [Linux-stm32] [PATCH v10 22/33] counter: Internalize sysfs interface code"
In reply to: Jesper Dangaard Brouer: "Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator"
Next in thread: Chuck Lever III: "Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Mar 22, 2021 at 01:04:46PM +0100, Jesper Dangaard Brouer wrote:
> On Mon, 22 Mar 2021 09:18:42 +0000
> Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > This series is based on top of Matthew Wilcox's series "Rationalise
> > __alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
> > test and are not using Andrew's tree as a baseline, I suggest using the
> > following git tree
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9
>
> page_bench04_bulk[1] micro-benchmark on branch: mm-bulk-rebase-v5r9
> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/bench/page_bench04_bulk.c
>
> BASELINE
> single_page alloc+put: Per elem: 199 cycles(tsc) 55.472 ns
>
> LIST variant: time_bulk_page_alloc_free_list: step=bulk size
>
> Per elem: 206 cycles(tsc) 57.478 ns (step:1)
> Per elem: 154 cycles(tsc) 42.861 ns (step:2)
> Per elem: 145 cycles(tsc) 40.536 ns (step:3)
> Per elem: 142 cycles(tsc) 39.477 ns (step:4)
> Per elem: 142 cycles(tsc) 39.610 ns (step:8)
> Per elem: 137 cycles(tsc) 38.155 ns (step:16)
> Per elem: 135 cycles(tsc) 37.739 ns (step:32)
> Per elem: 134 cycles(tsc) 37.282 ns (step:64)
> Per elem: 133 cycles(tsc) 36.993 ns (step:128)
>
> ARRAY variant: time_bulk_page_alloc_free_array: step=bulk size
>
> Per elem: 202 cycles(tsc) 56.383 ns (step:1)
> Per elem: 144 cycles(tsc) 40.047 ns (step:2)
> Per elem: 134 cycles(tsc) 37.339 ns (step:3)
> Per elem: 128 cycles(tsc) 35.578 ns (step:4)
> Per elem: 120 cycles(tsc) 33.592 ns (step:8)
> Per elem: 116 cycles(tsc) 32.362 ns (step:16)
> Per elem: 113 cycles(tsc) 31.476 ns (step:32)
> Per elem: 110 cycles(tsc) 30.633 ns (step:64)
> Per elem: 110 cycles(tsc) 30.596 ns (step:128)
>
> Compared to the previous results (see below) list-variant got faster,
> but array-variant is still faster. The array variant lost a little
> performance. I think this can be related to the stats counters got
> added/moved inside the loop, in this patchset.
>

If you are feeling particularly brave, take a look at
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-percpu-local_lock-v1r10

It's a prototype series rebased on top of the bulk allocator and this
version has not even been boot tested. While it'll get rough treatment
during review, it should reduce the cost of the stat updates in the
bulk allocator as a side-effect.

--
Mel Gorman
SUSE Labs

Next message: Peter Zijlstra: "Re: [PATCH RFC v2 8/8] selftests/perf: Add kselftest for remove_on_exec"
Previous message: Fabrice Gasnier: "Re: [Linux-stm32] [PATCH v10 22/33] counter: Internalize sysfs interface code"
In reply to: Jesper Dangaard Brouer: "Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator"
Next in thread: Chuck Lever III: "Re: [PATCH 0/3 v5] Introduce a bulk order-0 page allocator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]