Re: [PATCH 09/10] percpu: implement new dynamic percpu allocator

From: Tejun Heo
Date: Wed Feb 25 2009 - 22:18:18 EST


Hello,

Luck, Tony wrote:
> ia64 started out with a pinned TLB entry to map the percpu space to the
> top 64K of address space (so that the compiler can generate ld/st instructions
> with a small negative offset from register r0 to access local-to-this-cpu
> objects).
>
> Then we started using a one of the ar.k* registers to hold the base
> physical address for each cpus per-cpu area so that early parts of
> machine check code (which runs with MMU off) can access per-cpu variables.
>
> Finally we found that certain transaction processing benchmarks ran faster
> if we let the cpu have free access to one extra TLB entry ... so we
> stopped pinning the per-cpu area, and wrote a s/w fault handler to
> insert the mapping on demand (using the ar.k3 register to get the
> physical address for the mapping).
>
> N.B. ar.k3 is a medium-slow register ... I wouldn't want to use it
> in the code sequence for *every* per-cpu variable access.

Ah... I see, so the 64k limit for small offset. I think what we can
do is using the first chunk for static percpu variables. We'll still
be able to use the same accessor by doing something like...

#define unified_percpu_accessor(ptr) ({ \
if (__builtin_constant_p(ptr)) \
return r0 - unit_size + ptr; \
else \
do ar.k3 + ptr; \
})

So, dynamic ones will be slower than normal ones but faster than what
we currently have (it will be faster than indirect pointer
derferencing, right?) while keeping static accesses fast. Does it
sound okay to you? Also, does anyone know whether there's a working
ia64 emulator? There doesn't seem to be any and it seems almost
impossible to get hold of an actual ia64 machine over here. :-(

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/