Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpuarea

From: Jeremy Fitzhardinge
Date: Tue Jul 01 2008 - 17:10:55 EST


Eric W. Biederman wrote:
Jeremy Fitzhardinge <jeremy@xxxxxxxx> writes:

H. Peter Anvin wrote:
Eric W. Biederman wrote:
The zero-based PDA mechanism requires the introduction of a new ELF segment
based at vaddr 0 which is sufficiently unusual that it wouldn't surprise me
if
its triggering some toolchain bug.
Agreed. Given the previous description my hunch is that the bug is occurring
during objcopy. If vmlinux is good and the compressed kernel is bad.

Actually, it's not all that unusual... it's pretty common in various
restricted environments. That being said, it's probably uncommon for *64-bit*
code.
Well, it's also unusual because 1) it's vaddr 0, but paddr <high>, and 2) the
PHDRs are not sorted by vaddr order. 2) might actually be a bug.

I just looked and gcc does not use this technique for thread local data.

Which technique? It does assume you put the thread-local data near %gs (%fs in userspace), and it uses a small offset (positive or negative) to reach it.

At present, the x86-64 only uses %gs-relative addressing to reach the pda, which are always small positive offsets. It always accesses per-cpu data in a two-step process of getting the base of per-cpu data, then offsetting to find the particular variable.

x86-32 has no pda, and arranges %fs so that %fs:variable gets the percpu variant of variable. The offsets are always quite large.

My initial concern about all of this was not making symbols section relative
is relieved as this all appears to be a 64bit arch thing where that doesn't
matter.

Why's that? I thought you cared particularly about making the x86-64 kernel relocatable for kdump, and that using non-absolute symbols was part of that?

Has anyone investigated using the technique gcc uses for thread local storage?
http://people.redhat.com/drepper/tls.pdf

The powerpc guys tried using gcc-level thread-local storage, but it doesn't work well. per-cpu data and per-thread data have different constraints, and its hard to tell gcc about them. For example, if you have a section of preemptable code in your function, it's hard to tell gcc not to cache a "thread-local" variable across it, even though we could have switched CPUs in the meantime.

In particular using the local exec model so we can say:
movq %fs:x@tpoff,%rax

To load the contents of a per cpu variable x into %rax ?

If we can use that model it should make it easier to interface with things like
the stack protector code. Although we would still need to be very careful
about thread switches.

You mean cpu switches? We don't really have a notion of thread-local data in the kernel, other than things hanging off the kernel stack.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/