Re: [PATCH] per-cpu change

From: Rusty Russell (rusty@rustcorp.com.au)
Date: Fri May 02 2003 - 23:40:57 EST


In message <16050.41490.798865.517036@napali.hpl.hp.com> you write:
> - On NUMA, you want to allocate the per-CPU areas in node-local memory.

Thanks: I'd forgotten about this. Even "big smp" machines often have
different memory latencies.

I need to allow archs to override the core allocation.

> - It's important to initialize the per-CPU area early enough. We
> currently do it in cpu_init(). Anything later would probably break
> things.

Yep, I imagine you'd explicitly call setup_per_cpu_areas() in
cpu_init. (the later call will do nothing).

> I am a little concerned with supporting per-CPU storage in modules.
> What happens if the per-CPU area fills up? With the normal kernel,
> you'll get a link-time error, but with modules, you won't know until
> you try to insert a module.

Yes. In fact, you get a boot error not a link error in my
implementation.

My previous implementation used a linked-list of memory chunks, like
so:

        Core kernel percpu First dynamic percpu Second dynamic percpu
        +----+----+---...-+ -> +----+----+---...-+ -> +----+----+---...-+ ->
         CPU0 CPU1 CPU2...

You can then use the same __per_cpu_offset[], as the spacing between
each cpu's variables in the dynamically allocated sections is the same
as in the core kernel. You can __alloc_percpu any size up to the
initial inter-cpu spacing (which was equal to the initial amount of
static per-cpu data, or 4k, whatever is more).

But if the cpu's variables are not consecutive (ie your NUMA point),
this scheme breaks down: you'd need to allocate areas with the same
spacing between them, which won't happen.

So I think the current, simpler choice is preferable.

> But the alternative of using something similar to the user-level TLS
> model doesn't look exactly attractive either.

Yes, I spoke with Alan Modra about what he had to do on PPC64, and ran
screaming. We can live with more constraints, in the kernel, I think,
especially when the result is so fast and simple.

Thanks,
Rusty.

--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 07 2003 - 22:00:17 EST