[RFC v1 0/8] x86/init: Linux linker tables

From: Luis R. Rodriguez
Date: Tue Dec 15 2015 - 17:16:50 EST


From: "Luis R. Rodriguez" <mcgrof@xxxxxxxx>

A long time ago in a galaxy far,
far away...

Konrad Rzeszutek Wilk posted patches which eventually got merged to help
with modularizing the IOMMUs we have on x86 [0]. This work was done due to
the complex relationship that exists on IOMMUs and the requirements on
careful execution. The solution also provided a mechanism which jettisoned
unused IOMMUs during run-time.

During review, even though the code was merged, hpa did note that we tend
to encounter this type of problem "often enough that we should implement a
generic facility for it" [1], hpa acknowledged that it obviously has to be
based on sections and even noted that perhaps we might be able in the future to
automate its creation. He noted that the gPXE folks had done just this with
linker tables and suggested that "presumably we'd need a few different flavors
for init tables and so on, but this would make it a generic mechanism."

The IOMMU code got merged and this was left on someone's mental backburner.
I've had an itch to scratch recently to try to avoid issues which are possible
if one does not jettison other code carefully due to the large complexity of
implicit dependencies of certain code on x86 in particular with possible dead
code on x86 due to paravirtualization, and the IOMMU jettison strategy turned
out to be my favorite solution so far. I've taken on hpa's suggestions from
back in the day to review gPXE's solution to see if we could embrace it on
Linux for a generic section solution to help jettison code carefully.

What this patch set does exactly:

This RFC patch set attempts to add support for such a generic solution.
In the end, it turns out that the best solution possible was the best of
both worlds: a combination of what Konrad had implemented in addition to
what Michael Brown had implemented on the gXPE front. The IOMMU solution
enables simple semantic annotations for dependency relationships, this
however requires a run time sort. The gPXE solution grants the option to
simply sort at build time. One of gPXE's solution primary goals however was
also to help avoid bit-rot on code that's possible from #ifdef'ery. The Linux
linker table solution enables developers to pick and choose what they
need, with linker tables being the simplest solution. Contrary to gPXE
which strives to force compilation of all linker table solutions we
let developers pick *when* they want this as part of their solution.
As can be seen from the suggested x86 init specific use of linker tables
proposed you can also take advantage of both, linker sorting, optional
compilation when needed (at developer's discretion), and even careful
semantics annotation for dependency / relationship annotations. Although
the x86 init solution here is heavily inspired by the IOMMU solution it
diverges with strong semantics, and a new optional subarchitecture
annotation. Sorting of init sequences is structure specific, as such
each subsystem must defing their own solution unless semantics could
be shared. I considered sharing semantics but in the end this proved
pointless so this keeps things separate. A series of changes were made
to the x86 init sequence in contrast to the IOMMU solution to be *extremely
pedantic* on semantics, review of this changes can be studied on the
table-init tree [2].

Quick review of gPXE's solution and prospects on further changes:

In my review from gPXE's solution it was not clear what hpa meant by
gXPE folks having automated this process, they actually use linker tables
all around, forcing compilation of *everything* and just do linking of
enabled features at link time. You still need to build linker tables on
your own. What I do see more potential for in the future is enabling to
evolve stronger semantics over time, and this would also be subsystem specific.
This will be evident in this patch set on the x86 init use of linker tables.
I also see potential in strenghtening semantics for linker sorting, any of
these types of features however would impose requiring newer binutils. For
instance, gPXE's linker solution currently relies on SORT(), that defaults to
SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two digits
to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric
priority. Since one is in control of order-level numbers one can provide
guarantee that this sort should work as intended, however binutils also now has
a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits.
SORT_BY_INIT_PRIORITY() was designed specifically for init_array sections
though. Refer to the userspace mockup solution table-init git tree [2] commit
6deba47ee1ad461e90 for more details on this. One thing I can envision to help
here further are prospects for future linker enhancements, these however must
be carefully considered in light of possible requirements of newer binutils
and also of compatibility with other toolchains. For now we resort to the good
'ol SORT() which even Linux has made use of for ages already. In that regards
here, nothing new here. gPXE folks however did find some fun ICC compatiblity
issues, and have also developed fixes for them, these were obviously carried
over but should be reviewed carefully. Lastly, I should point out that
essentially what we're developing are different forms loose and strong
semantics -- in the most complex form what we really are after are feature
graphs. Mauro explained to me that for media driver they needed to build a
feature graphs, IMHO that could be a next level of strong semantics we might
want to consider. For now though the combination of gPXE's linker-table
solution based on linker order-level sort, and our old IOMMU init solution
and its simple dependency map (and possible extensions, see subarchitecture)
should likely suffice as a light weight solution for where we need semantics
for at least on the x86 init path.

Motivation and possible enhancements:

To understand what made me tick to work on this feel free to read the dead
code concerns with paravirtualization posts I've made, which also go into
and detail Xen's alternative entry point [4] [5]. That's not the end of
possibilities to help address to possible "dead code" on Linux, I have other
ideas but I have a bit more work to do before publishing anything about it.
If anything -- later work should likely supplement this solution further.

Currently the earliest I was able to make use of boot_params on x86-64 for
linker tables was after load_idt(), so I've decided to stick with the earliest
init call for x86 init sequences starting on x86_64_start_reservation().
However, if we're able to make use of boot_params earlier (help appreciated),
in particular just boot_params.hdr.hardware_subarch we should be able to make
clearer and careful annotations on early x86-64 init code. Using the
boot_params.hdr.hardware_subarch and requiring clear subarchitecture support
annotations on early init code should provide a *proactive* means to avoid issues
such as the cr4 shadow oversight [6] which later caused crashes on Xen. The
latest similar type of issue is when KAsan was introduced, after it was
integrated I suspected KAsan would probably break if enabled on Xen, I reported
this in March 2015 [7] and Andrey confirmed this might be the case but since he
wanted it enabled for allyesconfig and allmodconfig he could not think of a
clean way to disable it or address support for it then [8]. During the
development of this patch set I confirmed KAsan crashes Xen dom0 and therefore
should obviously crash Xen PV guests as well. Using the same kernel KAsan
still worked on KVM and bare-metal. Provided we had early access to
boot_params.hdr.hardware_subarch, in x86_64_start_kernel() we could annotate
kasan_early_init() and friends not supported on Xen, however since we don't
have a way to disable KAsan at run time at this time we wouldn't have any other
option but to crash early. Because of this though we should perhaps just consider
disabling KAsan for Xen configs and KAsan folks might just have to bite the
bullet. If KAsan folks are gung-ho about not disabling KAsan when Xen is
enabled and if its easier to add support to disable KAsan at run time than
it is to add KAsan support for Xen one alternative might be to use linker
tables to annotate this and disable KAsan at run time for Xen.

One of the points of the use of the x86 linker table solution and specific
semantics there in is to *force* developers to think about requirements on x86
carefully. So for instance by requiring itemizing supported subarchitectures
*early* on in development we should be able to avoid proactively issues such as
the crash due to the cr4 shadow changes not added on the Xen entry path, and
also missing the requirement to develop a solution for Kasan for Xen. The
proposed x86 linker table solution is not the first to use
boot_params.hdr.hardware_subarch, its first use was actually on i386, and for
lguest. We take advantage of it to avoid further extending pv_ops, and
*actually* to see if we could simplify pv_ops over time further. This patch set
also includes a renaming of paravirt_enabled() to paravirt_legacy()

An unexpected long term side goal which *might* be possible due to linker
tables on x86 is to help unify the different C entry points for Linux x86-64.
I've taken a stab at it after these patches [9] but it fails on Xen, likely
because the stack is not set up right due to the different calls / argument
requirements. If this is desirable folks could take a look and perhaps help
on that front.

Where to get code alternatively and testing:

All of these patches are RFCs, if anything the only patches worth consdering
merging sooner rather than later might be "paravirt: rename paravirt_enabled
to paravirt_legacy" and "x86/boot: add BIT() to boot/bitops.h". After review
I can send those separately in patch form if agreeable. Since these are all
just RFCs I've based these patches on top of Linus' v4.4-rc5. If you want all
patches in one file you can get them here [10], and a respective linux-next
version which applies on top of next-20151215 here [11]. I also have two
git trees set up with branches for this code, one based on Linus' tree [12]
and another based on linux-next next-20151215 [13]. I'll see what issues
zero-day bot testing finds out. I've just run time tested this on x86-64
bare metal, KVM, and Xen dom0 so far.

[0] https://marc.info/?l=linux-kernel&m=128284562303565&w=2
[1] https://marc.info/?l=linux-kernel&m=128285216913266&w=2
[2] https://github.com/mcgrof/table-init/
[3] https://github.com/mcgrof/table-init/commit/6deba47ee1ad461e90e0fbba226a535cfc1c58f3
[4] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
[5] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html
[6] http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg02742.html
[7] http://lkml.kernel.org/r/CAB=NE6Xs5fepzNtymzT4CueeJZ0KMPETpda114DpL4eMtDswtw@xxxxxxxxxxxxxx
[8] http://lkml.kernel.org/r/54F5B3DA.70203@xxxxxxxxxxx
[9] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/x86-merge-x86-init-v1.patch
[10] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-20151215-rfc-v1-linker-tables.patch
[11] http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-next-20151215-rfc-v1-linker-tables.patch
[12] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20151215-rfc-v1-linker-tables
[13] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20151215-rfc-v1-linker-tables

Luis R. Rodriguez (8):
paravirt: rename paravirt_enabled to paravirt_legacy
tables.h: add linker table support
x86/boot: add BIT() to boot/bitops.h
x86/init: add linker table support
x86/init: move ebda reservations into linker table
x86/init: use linker table for i386 early setup
x86/init: user linker table for ce4100 early setup
x86/init: use linker table for mid early setup

Documentation/kbuild/makefiles.txt | 19 ++
arch/x86/Kconfig.debug | 47 ++++
arch/x86/boot/bitops.h | 2 +
arch/x86/boot/boot.h | 2 +-
arch/x86/include/asm/bios_ebda.h | 2 -
arch/x86/include/asm/paravirt.h | 4 +-
arch/x86/include/asm/paravirt_types.h | 11 +-
arch/x86/include/asm/processor.h | 2 +-
arch/x86/include/asm/setup.h | 12 -
arch/x86/include/asm/x86_init.h | 1 +
arch/x86/include/asm/x86_init_fn.h | 267 ++++++++++++++++++++
arch/x86/kernel/Makefile | 4 +-
arch/x86/kernel/apm_32.c | 2 +-
arch/x86/kernel/asm-offsets.c | 2 +-
arch/x86/kernel/cpu/intel.c | 2 +-
arch/x86/kernel/cpu/microcode/core.c | 2 +-
arch/x86/kernel/dbg-tables/Makefile | 18 ++
arch/x86/kernel/dbg-tables/alpha.c | 10 +
arch/x86/kernel/dbg-tables/beta.c | 18 ++
arch/x86/kernel/dbg-tables/delta.c | 10 +
arch/x86/kernel/dbg-tables/gamma.c | 18 ++
arch/x86/kernel/dbg-tables/gamma.h | 3 +
arch/x86/kernel/{head.c => ebda.c} | 6 +-
arch/x86/kernel/head32.c | 22 +-
arch/x86/kernel/head64.c | 4 +-
arch/x86/kernel/init.c | 55 +++++
arch/x86/kernel/kvm.c | 9 +-
arch/x86/kernel/paravirt.c | 2 +-
arch/x86/kernel/sort-init.c | 114 +++++++++
arch/x86/kernel/tboot.c | 2 +-
arch/x86/kernel/vmlinux.lds.S | 25 ++
arch/x86/lguest/boot.c | 2 +-
arch/x86/mm/dump_pagetables.c | 2 +-
arch/x86/platform/ce4100/ce4100.c | 4 +-
arch/x86/platform/intel-mid/intel-mid.c | 4 +-
arch/x86/tools/relocs.c | 1 +
arch/x86/xen/enlighten.c | 2 +-
drivers/pnp/pnpbios/core.c | 2 +-
include/linux/tables.h | 421 ++++++++++++++++++++++++++++++++
scripts/Makefile.build | 4 +-
scripts/Makefile.clean | 1 +
scripts/Makefile.lib | 12 +
42 files changed, 1091 insertions(+), 61 deletions(-)
create mode 100644 arch/x86/include/asm/x86_init_fn.h
create mode 100644 arch/x86/kernel/dbg-tables/Makefile
create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
create mode 100644 arch/x86/kernel/dbg-tables/beta.c
create mode 100644 arch/x86/kernel/dbg-tables/delta.c
create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
rename arch/x86/kernel/{head.c => ebda.c} (94%)
create mode 100644 arch/x86/kernel/init.c
create mode 100644 arch/x86/kernel/sort-init.c
create mode 100644 include/linux/tables.h

--
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/