Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpuarea
From: Jeremy Fitzhardinge
Date: Tue Jul 01 2008 - 17:10:55 EST
Eric W. Biederman wrote:
Jeremy Fitzhardinge <jeremy@xxxxxxxx> writes:
H. Peter Anvin wrote:
Eric W. Biederman wrote:
The zero-based PDA mechanism requires the introduction of a new ELF segment
based at vaddr 0 which is sufficiently unusual that it wouldn't surprise me
if
its triggering some toolchain bug.
Agreed. Given the previous description my hunch is that the bug is occurring
during objcopy. If vmlinux is good and the compressed kernel is bad.
Actually, it's not all that unusual... it's pretty common in various
restricted environments. That being said, it's probably uncommon for *64-bit*
code.
Well, it's also unusual because 1) it's vaddr 0, but paddr <high>, and 2) the
PHDRs are not sorted by vaddr order. 2) might actually be a bug.
I just looked and gcc does not use this technique for thread local data.
Which technique? It does assume you put the thread-local data near %gs
(%fs in userspace), and it uses a small offset (positive or negative) to
reach it.
At present, the x86-64 only uses %gs-relative addressing to reach the
pda, which are always small positive offsets. It always accesses
per-cpu data in a two-step process of getting the base of per-cpu data,
then offsetting to find the particular variable.
x86-32 has no pda, and arranges %fs so that %fs:variable gets the percpu
variant of variable. The offsets are always quite large.
My initial concern about all of this was not making symbols section relative
is relieved as this all appears to be a 64bit arch thing where that doesn't
matter.
Why's that? I thought you cared particularly about making the x86-64
kernel relocatable for kdump, and that using non-absolute symbols was
part of that?
Has anyone investigated using the technique gcc uses for thread local storage?
http://people.redhat.com/drepper/tls.pdf
The powerpc guys tried using gcc-level thread-local storage, but it
doesn't work well. per-cpu data and per-thread data have different
constraints, and its hard to tell gcc about them. For example, if you
have a section of preemptable code in your function, it's hard to tell
gcc not to cache a "thread-local" variable across it, even though we
could have switched CPUs in the meantime.
In particular using the local exec model so we can say:
movq %fs:x@tpoff,%rax
To load the contents of a per cpu variable x into %rax ?
If we can use that model it should make it easier to interface with things like
the stack protector code. Although we would still need to be very careful
about thread switches.
You mean cpu switches? We don't really have a notion of thread-local
data in the kernel, other than things hanging off the kernel stack.
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/