Re: [PATCHSET x86/core/percpu] improve the first percpu chunkallocation

From: Ingo Molnar
Date: Tue Feb 24 2009 - 09:12:49 EST



* Tejun Heo <tj@xxxxxxxxxx> wrote:

> What's missing is unification of static and dynamic accessors
> and thus the faster accessors - percpu_read() and friends -
> for dynamic ones. This will be the next round of patches.

Ok, good - we are in agreement then and i'll wait for those
patches.

And i think i finally decoded the real source of the disconnect
:-)

It's still about this restriction:

+ /*
+ * If large page isn't supported, there's no benefit in doing
+ * this. Also, embedding allocation doesn't play well with
+ * NUMA.
+ */
+ if (!cpu_has_pse || pcpu_need_numa())
+ return -EINVAL;

This is what makes no sense (why force the static percpu area
into 4K mappings on NUMA).

You do it because i think you misunderstood my original 2MB TLB
static area suggestion. setup_pcpu_embed() does this now:

+ pcpue_ptr = pcpu_alloc_bootmem(0, num_possible_cpus() * pcpue_unit_size,
+ PAGE_SIZE);

That is not NUMA-friendly indeed.

What should be done instead is to up the unit size to 2MB as i
suggested, and to allocate 2MB sized and 2MB aligned
numa-correct area for each CPU, via bootmem.

To quote my original mail:

> > - allocate the static percpu area using bootmem-alloc, but
> > using a 2MB alignment parameter and a 2MB aligned size. Then
> > we can remap it to some convenient and undisturbed virtual
> > memory area, using 2MB TLBs. [*]

I.e. each individual 2MB allocated largepage can then be
remapped as a 2MB TLB to the high (vmalloc) area. Followed by
ordinary 4K mappings for regular percpu_alloc() pages.

( and the partial, unused pages within this initial chunk are
returned to bootmem. )

That will be NUMA-friendly and i suspect we should also use it
on SMP just to get that aspect of the code tested better.

Do _not_ allocate the units together in one bootmem allocation
because that's not NUMA-friendly.

Ok?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/