Re: [PATCH UPDATED] percpu: use dynamic percpu allocator as thedefault percpu allocator

From: Ingo Molnar
Date: Mon Mar 30 2009 - 07:54:57 EST



* Tejun Heo <tj@xxxxxxxxxx> wrote:

> Impact: use dynamic allocator for most archs w/o
> CONFIG_HAVE_SETUP_PER_CPU_AREA
>
> This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use
> dynamic percpu allocator. The first chunk is allocated using
> embedding helper and 8k is reserved for modules. This ensures
> that the new allocator behaves almost identically to the original
> allocator as long as static percpu variables are concerned, so it
> shouldn't introduce much breakage.
>
> s390 and alpha use custom SHIFT_PERCPU_PTR() to work around
> addressing range limit the addressing model imposes.
> Unfortunately, this breaks if the address is specified using a
> variable, so for now, the two archs aren't converted.
>
> The following architectures are affected by this change.
>
> * sh
> * arm
> * cris
> * mips
> * sparc(32)
> * blackfin
> * avr32
> * parisc
> * m32r
> * powerpc(32)
>
> As this change makes the dynamic allocator the default one,
> CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert -
> CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be
> converted archs. These archs implement their own
> setup_per_cpu_areas() and the conversion is not trivial.
>
> * powerpc(64)
> * sparc(64)
> * ia64
> * alpha
> * s390
>
> Boot and batch alloc/free tests on x86_32 with debug code (x86_32
> doesn't use default first chunk initialization). Compile tested
> on sparc(32), powerpc(32), arm and alpha.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Paul Mundt <lethal@xxxxxxxxxxxx>
> Cc: Russell King <rmk@xxxxxxxxxxxxxxxx>
> Cc: Mikael Starvik <starvik@xxxxxxxx>
> Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
> Cc: David S. Miller <davem@xxxxxxxxxxxxx>
> Cc: Bryan Wu <cooloney@xxxxxxxxxx>
> Cc: Kyle McMartin <kyle@xxxxxxxxxxx>
> Cc: Matthew Wilcox <matthew@xxxxxx>
> Cc: Grant Grundler <grundler@xxxxxxxxxxxxxxxx>
> Cc: Hirokazu Takata <takata@xxxxxxxxxxxxxx>
> Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
> Cc: Richard Henderson <rth@xxxxxxxxxxx>
> Cc: Ivan Kokshaysky <ink@xxxxxxxxxxxxxxxxxxxx>
> Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
> Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
> ---

> Okay, this should keep s390 and alpha working till proper solution
> is found. Martin, can you please verify? Ingo, please feel free
> to push this upstream (or -next) once Martin acks.
>
> arch/alpha/Kconfig | 3 +++
> arch/ia64/Kconfig | 3 +++
> arch/powerpc/Kconfig | 3 +++
> arch/s390/Kconfig | 3 +++
> arch/sparc/Kconfig | 3 +++
> arch/x86/Kconfig | 3 ---
> include/linux/percpu.h | 12 +++++++++---
> init/main.c | 24 ------------------------
> kernel/module.c | 6 +++---
> mm/Makefile | 2 +-
> mm/allocpercpu.c | 28 ++++++++++++++++++++++++++++
> mm/percpu.c | 40 +++++++++++++++++++++++++++++++++++++++-
> 12 files changed, 95 insertions(+), 35 deletions(-)

We also need the Ack from Davem for Sparc32, the Ack from Martin, an
Ack from Ben for the PowerPC bits and an ack from Tony for the IA64
bits. We also need the ack from Andrew and Rusty for the MM and
module bits. Plus a final Reviewed-by of the all-architectures
percpu allocator concepts from Christoph and Nick would be nice as
well, just in case we missed some trick or failed to consider a
complication.

We need these acks both for technical correctness and for the form
of the proposed workflow - and the acks need to be in the commit
logs as well. (i.e. we need another rebase once the dust settles)

I know this is a _lot_ of paperwork but this is the proper way to do
it: the main gravity of impact is not on x86 anymore, and that
impact is non-trivial on them, so we'll do whatever is most
convenient to the architectures and maintainers affected.

It was fine and efficient to prototype this on x86 (which was the
architecture historically most infested with legacy percpu
complications), but the work flow spreads out and slows down from
now on.

We may also need to split this up into per architecture bits if that
is requested, so that they can merge and switch this allocator on at
a pace that suits their own testing and merging schedules best. More
complications beyond the s390 one might be discovered.

Furthermore, while i dont mind pulling it into tip/core if everyone
agrees with that and can push it to -next as well, i think these
bits are best suited for -mm from this point on, it is the tree best
set up for more complex many-architecture merges.

The bits obviously look fine from an x86 perspective as well (they
have essentially no impact there) so you can put in my ack for that
in any case.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/