Re: [RFC 00/15] x86_64: Optimize percpu accesses
From: Jeremy Fitzhardinge
Date: Thu Jul 10 2008 - 13:48:32 EST
Christoph Lameter wrote:
Jeremy Fitzhardinge wrote:
The base address of the percpu area and the offsets from that base are
completely independent values.
Definitely.
The addressing modes:
* ABS
* off(%rip)
Are exactly equivalent in what offsets they can generate, so long as *at
link time* the percpu *symbols* are within 2G of the code addressing
them. *After* the addressing mode has generated an effective address
(by whatever means it likes), the %gs: override applies the segment
base, which can therefore offset the effective address to anywhere at all.
Right. The problem is with the percpu area handled by the linker. That percpu area is used by the boot cpu and later we setup other additional per cpu areas. Those can be placed in an arbitrary way if one goes through a table of pointers to these areas.
Yes, but the offset is the same either way. When you want a cpu to
refer to its own percpu memory, regardless of where it is in memory, you
just reload the gs base. The offsets are the same everywhere, and are
computed by the linker with out knowledge or reference to where the
final address will end up.
In other words, at source level:
a = x86_read_percpu(foo)
will generate
mov %gs:percpu__foo, %rax
where the linker decides the value of percpu__foo, which can be up to
4G. Or if we use rip-relative:
mov %gs:percpu__foo(%rip), %rax
we end up with the same result, except that the generated instruction is
a bit more compact.
In the final generated assembly, it ends up being a hardcoded constant
address. Say, 0x7838.
Now if we allocate cpu 43 percpu data at 0xfffffffff7198000, we load %gs
base with that value, and then the instruction is still
mov %gs:0x7838, %rax
and the computed address will be 0xfffffffff7198000 + 0x7838 =
0xfffffffff719f838.
And cpu 62 has its percpu data at 0xffffffffe3819000, and the
instruction is still
mov %gs:0x7838, %rax
and the computed address for it's version of percpu__foo is
0xffffffffe3819000 + 0x7838 = 0xffffffffe3820838.
Note that it doesn't matter how you decide to place the percpu data, so
long as you can load the address into the %gs base.
However, that does not work if one calculates the virtual address instead of looking up a physical address.
Calculate a virtual address for what? Physical address for what? If
you have a large virtual region allocating 256M of percpu space, er, per
cpu, then you just load %gs base with percpu_region_base + cpuid *
256M. It has no effect on the instructions accessing that percpu space.
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/