Re: [patch 2/2] mm: Reimplementation of alloc_percpu

From: Rusty Russell
Date: Tue Jan 18 2005 - 19:09:43 EST


On Tue, 2005-01-18 at 20:45 +0530, Ravikiran G Thirumalai wrote:
> On Tue, Jan 18, 2005 at 12:30:32PM +1100, Rusty Russell wrote:
> > On Tue, 2005-01-18 at 00:06 +0530, Ravikiran G Thirumalai wrote:
> > > ...
> > > The allocator can be easily modified to use __per_cpu_offset[] table at a later
> > > stage by:
> > > 1. Allocating ALIGN(__per_cpu_end - __per_cpu_start, PAGE_SIZE) for the
> > > static percpu areas and populating __per_cpu_offset[] offset table
> > > 2. Making PCPU_BLKSIZE same as the static per cpu area size above
> > > 3. Serving dynamic percpu requests from modules etc from blocks by
> > > returning ret -= __per_cpu_offset[0] from a percpu block. This way
> > > modules need not have a limit on static percpu areas.
> >
> > Unfortunately ia64 breaks (3). They have pinned TLB entries covering
> > 64k, which they put the static per-cpu data into. This is used for
> > local_inc, etc, and David Mosberger loved that trick (this is why my
> > version allocated from that first reserved block for modules' static
> > per-cpu vars).
>
> Hmmm... then if we change (1) to allocate PERCPU_ENOUGH_ROOM, then the math
> will work out? We will still have a limit on static per-cpu areas in
> modules, but alloc_percpu can use the same __per_cpu_offset table[].
> Will this work?

I think so.

> But, what I am concerned is about arches like x86_64 which currently
> do not maintain the relation:
> __per_cpu_offset[n] = __per_cpu_offset[0] + static_percpu_size * n ---> (A)
> correct me if I am wrong, but both our methods for alloc_percpu to use
> per_cpu_offset depend on the static per-cpu areas being virtually
> contiguous (with relation (A) above being maintained).
> If arches cannot sew up node local pages to form a virtually contiguous
> block, maybe because setup_per_cpu_areas happens early during boot,
> then we will have a problem.

They don't actually have to be contiguous, although that makes it
easier. They can reserve virtual address space to extend their per-cpu
areas. I think this is a worthwhile tradeoff if they want to do this.

Cheers,
Rusty.
--
A bad analogy is like a leaky screwdriver -- Richard Braakman

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/