Re: [RFC v1 0/8] x86/init: Linux linker tables
From: H. Peter Anvin
Date: Tue Dec 15 2015 - 18:00:20 EST
On December 15, 2015 2:16:29 PM PST, "Luis R. Rodriguez" <mcgrof@xxxxxxxxxxxxxxxx> wrote:
>From: "Luis R. Rodriguez" <mcgrof@xxxxxxxx>
>
> A long time ago in a galaxy far,
> far away...
>
>Konrad Rzeszutek Wilk posted patches which eventually got merged to
>help
>with modularizing the IOMMUs we have on x86 [0]. This work was done due
>to
>the complex relationship that exists on IOMMUs and the requirements on
>careful execution. The solution also provided a mechanism which
>jettisoned
>unused IOMMUs during run-time.
>
>During review, even though the code was merged, hpa did note that we
>tend
>to encounter this type of problem "often enough that we should
>implement a
>generic facility for it" [1], hpa acknowledged that it obviously has to
>be
>based on sections and even noted that perhaps we might be able in the
>future to
>automate its creation. He noted that the gPXE folks had done just this
>with
>linker tables and suggested that "presumably we'd need a few different
>flavors
>for init tables and so on, but this would make it a generic mechanism."
>
>The IOMMU code got merged and this was left on someone's mental
>backburner.
>I've had an itch to scratch recently to try to avoid issues which are
>possible
>if one does not jettison other code carefully due to the large
>complexity of
>implicit dependencies of certain code on x86 in particular with
>possible dead
>code on x86 due to paravirtualization, and the IOMMU jettison strategy
>turned
>out to be my favorite solution so far. I've taken on hpa's suggestions
>from
>back in the day to review gPXE's solution to see if we could embrace it
>on
>Linux for a generic section solution to help jettison code carefully.
>
>What this patch set does exactly:
>
>This RFC patch set attempts to add support for such a generic solution.
>In the end, it turns out that the best solution possible was the best
>of
>both worlds: a combination of what Konrad had implemented in addition
>to
>what Michael Brown had implemented on the gXPE front. The IOMMU
>solution
>enables simple semantic annotations for dependency relationships, this
>however requires a run time sort. The gPXE solution grants the option
>to
>simply sort at build time. One of gPXE's solution primary goals however
>was
>also to help avoid bit-rot on code that's possible from #ifdef'ery. The
>Linux
>linker table solution enables developers to pick and choose what they
>need, with linker tables being the simplest solution. Contrary to gPXE
>which strives to force compilation of all linker table solutions we
>let developers pick *when* they want this as part of their solution.
>As can be seen from the suggested x86 init specific use of linker
>tables
>proposed you can also take advantage of both, linker sorting, optional
>compilation when needed (at developer's discretion), and even careful
>semantics annotation for dependency / relationship annotations.
>Although
>the x86 init solution here is heavily inspired by the IOMMU solution it
>diverges with strong semantics, and a new optional subarchitecture
>annotation. Sorting of init sequences is structure specific, as such
>each subsystem must defing their own solution unless semantics could
>be shared. I considered sharing semantics but in the end this proved
>pointless so this keeps things separate. A series of changes were made
>to the x86 init sequence in contrast to the IOMMU solution to be
>*extremely
>pedantic* on semantics, review of this changes can be studied on the
>table-init tree [2].
>
>Quick review of gPXE's solution and prospects on further changes:
>
>In my review from gPXE's solution it was not clear what hpa meant by
>gXPE folks having automated this process, they actually use linker
>tables
>all around, forcing compilation of *everything* and just do linking of
>enabled features at link time. You still need to build linker tables on
>your own. What I do see more potential for in the future is enabling to
>evolve stronger semantics over time, and this would also be subsystem
>specific.
>This will be evident in this patch set on the x86 init use of linker
>tables.
>I also see potential in strenghtening semantics for linker sorting, any
>of
>these types of features however would impose requiring newer binutils.
>For
>instance, gPXE's linker solution currently relies on SORT(), that
>defaults to
>SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two
>digits
>to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric
>priority. Since one is in control of order-level numbers one can
>provide
>guarantee that this sort should work as intended, however binutils also
>now has
>a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits.
>SORT_BY_INIT_PRIORITY() was designed specifically for init_array
>sections
>though. Refer to the userspace mockup solution table-init git tree [2]
>commit
>6deba47ee1ad461e90 for more details on this. One thing I can envision
>to help
>here further are prospects for future linker enhancements, these
>however must
>be carefully considered in light of possible requirements of newer
>binutils
>and also of compatibility with other toolchains. For now we resort to
>the good
>'ol SORT() which even Linux has made use of for ages already. In that
>regards
>here, nothing new here. gPXE folks however did find some fun ICC
>compatiblity
>issues, and have also developed fixes for them, these were obviously
>carried
>over but should be reviewed carefully. Lastly, I should point out that
>essentially what we're developing are different forms loose and strong
>semantics -- in the most complex form what we really are after are
>feature
>graphs. Mauro explained to me that for media driver they needed to
>build a
>feature graphs, IMHO that could be a next level of strong semantics we
>might
>want to consider. For now though the combination of gPXE's linker-table
>solution based on linker order-level sort, and our old IOMMU init
>solution
>and its simple dependency map (and possible extensions, see
>subarchitecture)
>should likely suffice as a light weight solution for where we need
>semantics
>for at least on the x86 init path.
>
>Motivation and possible enhancements:
>
>To understand what made me tick to work on this feel free to read the
>dead
>code concerns with paravirtualization posts I've made, which also go
>into
>and detail Xen's alternative entry point [4] [5]. That's not the end of
>possibilities to help address to possible "dead code" on Linux, I have
>other
>ideas but I have a bit more work to do before publishing anything about
>it.
>If anything -- later work should likely supplement this solution
>further.
>
>Currently the earliest I was able to make use of boot_params on x86-64
>for
>linker tables was after load_idt(), so I've decided to stick with the
>earliest
>init call for x86 init sequences starting on
>x86_64_start_reservation().
>However, if we're able to make use of boot_params earlier (help
>appreciated),
>in particular just boot_params.hdr.hardware_subarch we should be able
>to make
>clearer and careful annotations on early x86-64 init code. Using the
>boot_params.hdr.hardware_subarch and requiring clear subarchitecture
>support
>annotations on early init code should provide a *proactive* means to
>avoid issues
>such as the cr4 shadow oversight [6] which later caused crashes on Xen.
>The
>latest similar type of issue is when KAsan was introduced, after it was
>integrated I suspected KAsan would probably break if enabled on Xen, I
>reported
>this in March 2015 [7] and Andrey confirmed this might be the case but
>since he
>wanted it enabled for allyesconfig and allmodconfig he could not think
>of a
>clean way to disable it or address support for it then [8]. During the
>development of this patch set I confirmed KAsan crashes Xen dom0 and
>therefore
>should obviously crash Xen PV guests as well. Using the same kernel
>KAsan
>still worked on KVM and bare-metal. Provided we had early access to
>boot_params.hdr.hardware_subarch, in x86_64_start_kernel() we could
>annotate
>kasan_early_init() and friends not supported on Xen, however since we
>don't
>have a way to disable KAsan at run time at this time we wouldn't have
>any other
>option but to crash early. Because of this though we should perhaps
>just consider
>disabling KAsan for Xen configs and KAsan folks might just have to bite
>the
>bullet. If KAsan folks are gung-ho about not disabling KAsan when Xen
>is
>enabled and if its easier to add support to disable KAsan at run time
>than
>it is to add KAsan support for Xen one alternative might be to use
>linker
>tables to annotate this and disable KAsan at run time for Xen.
>
>One of the points of the use of the x86 linker table solution and
>specific
>semantics there in is to *force* developers to think about requirements
>on x86
>carefully. So for instance by requiring itemizing supported
>subarchitectures
>*early* on in development we should be able to avoid proactively issues
>such as
>the crash due to the cr4 shadow changes not added on the Xen entry
>path, and
>also missing the requirement to develop a solution for Kasan for Xen.
>The
>proposed x86 linker table solution is not the first to use
>boot_params.hdr.hardware_subarch, its first use was actually on i386,
>and for
>lguest. We take advantage of it to avoid further extending pv_ops, and
>*actually* to see if we could simplify pv_ops over time further. This
>patch set
>also includes a renaming of paravirt_enabled() to paravirt_legacy()
>
>An unexpected long term side goal which *might* be possible due to
>linker
>tables on x86 is to help unify the different C entry points for Linux
>x86-64.
>I've taken a stab at it after these patches [9] but it fails on Xen,
>likely
>because the stack is not set up right due to the different calls /
>argument
>requirements. If this is desirable folks could take a look and perhaps
>help
>on that front.
>
>Where to get code alternatively and testing:
>
>All of these patches are RFCs, if anything the only patches worth
>consdering
>merging sooner rather than later might be "paravirt: rename
>paravirt_enabled
>to paravirt_legacy" and "x86/boot: add BIT() to boot/bitops.h". After
>review
>I can send those separately in patch form if agreeable. Since these are
>all
>just RFCs I've based these patches on top of Linus' v4.4-rc5. If you
>want all
>patches in one file you can get them here [10], and a respective
>linux-next
>version which applies on top of next-20151215 here [11]. I also have
>two
>git trees set up with branches for this code, one based on Linus' tree
>[12]
>and another based on linux-next next-20151215 [13]. I'll see what
>issues
>zero-day bot testing finds out. I've just run time tested this on
>x86-64
>bare metal, KVM, and Xen dom0 so far.
>
>[0] https://marc.info/?l=linux-kernel&m=128284562303565&w=2
>[1] https://marc.info/?l=linux-kernel&m=128285216913266&w=2
>[2] https://github.com/mcgrof/table-init/
>[3]
>https://github.com/mcgrof/table-init/commit/6deba47ee1ad461e90e0fbba226a535cfc1c58f3
>[4]
>http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
>[5]
>http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html
>[6]
>http://lists.xenproject.org/archives/html/xen-devel/2015-02/msg02742.html
>[7]
>http://lkml.kernel.org/r/CAB=NE6Xs5fepzNtymzT4CueeJZ0KMPETpda114DpL4eMtDswtw@xxxxxxxxxxxxxx
>[8] http://lkml.kernel.org/r/54F5B3DA.70203@xxxxxxxxxxx
>[9]
>http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/x86-merge-x86-init-v1.patch
>[10]
>http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-20151215-rfc-v1-linker-tables.patch
>[11]
>http://drvbp1.linux-foundation.org/~mcgrof/patches/2015/12/15/pend-next-20151215-rfc-v1-linker-tables.patch
>[12]
>https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20151215-rfc-v1-linker-tables
>[13]
>https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20151215-rfc-v1-linker-tables
>
>Luis R. Rodriguez (8):
> paravirt: rename paravirt_enabled to paravirt_legacy
> tables.h: add linker table support
> x86/boot: add BIT() to boot/bitops.h
> x86/init: add linker table support
> x86/init: move ebda reservations into linker table
> x86/init: use linker table for i386 early setup
> x86/init: user linker table for ce4100 early setup
> x86/init: use linker table for mid early setup
>
> Documentation/kbuild/makefiles.txt | 19 ++
> arch/x86/Kconfig.debug | 47 ++++
> arch/x86/boot/bitops.h | 2 +
> arch/x86/boot/boot.h | 2 +-
> arch/x86/include/asm/bios_ebda.h | 2 -
> arch/x86/include/asm/paravirt.h | 4 +-
> arch/x86/include/asm/paravirt_types.h | 11 +-
> arch/x86/include/asm/processor.h | 2 +-
> arch/x86/include/asm/setup.h | 12 -
> arch/x86/include/asm/x86_init.h | 1 +
> arch/x86/include/asm/x86_init_fn.h | 267 ++++++++++++++++++++
> arch/x86/kernel/Makefile | 4 +-
> arch/x86/kernel/apm_32.c | 2 +-
> arch/x86/kernel/asm-offsets.c | 2 +-
> arch/x86/kernel/cpu/intel.c | 2 +-
> arch/x86/kernel/cpu/microcode/core.c | 2 +-
> arch/x86/kernel/dbg-tables/Makefile | 18 ++
> arch/x86/kernel/dbg-tables/alpha.c | 10 +
> arch/x86/kernel/dbg-tables/beta.c | 18 ++
> arch/x86/kernel/dbg-tables/delta.c | 10 +
> arch/x86/kernel/dbg-tables/gamma.c | 18 ++
> arch/x86/kernel/dbg-tables/gamma.h | 3 +
> arch/x86/kernel/{head.c => ebda.c} | 6 +-
> arch/x86/kernel/head32.c | 22 +-
> arch/x86/kernel/head64.c | 4 +-
> arch/x86/kernel/init.c | 55 +++++
> arch/x86/kernel/kvm.c | 9 +-
> arch/x86/kernel/paravirt.c | 2 +-
> arch/x86/kernel/sort-init.c | 114 +++++++++
> arch/x86/kernel/tboot.c | 2 +-
> arch/x86/kernel/vmlinux.lds.S | 25 ++
> arch/x86/lguest/boot.c | 2 +-
> arch/x86/mm/dump_pagetables.c | 2 +-
> arch/x86/platform/ce4100/ce4100.c | 4 +-
> arch/x86/platform/intel-mid/intel-mid.c | 4 +-
> arch/x86/tools/relocs.c | 1 +
> arch/x86/xen/enlighten.c | 2 +-
> drivers/pnp/pnpbios/core.c | 2 +-
>include/linux/tables.h | 421
>++++++++++++++++++++++++++++++++
> scripts/Makefile.build | 4 +-
> scripts/Makefile.clean | 1 +
> scripts/Makefile.lib | 12 +
> 42 files changed, 1091 insertions(+), 61 deletions(-)
> create mode 100644 arch/x86/include/asm/x86_init_fn.h
> create mode 100644 arch/x86/kernel/dbg-tables/Makefile
> create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
> create mode 100644 arch/x86/kernel/dbg-tables/beta.c
> create mode 100644 arch/x86/kernel/dbg-tables/delta.c
> create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
> create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
> rename arch/x86/kernel/{head.c => ebda.c} (94%)
> create mode 100644 arch/x86/kernel/init.c
> create mode 100644 arch/x86/kernel/sort-init.c
> create mode 100644 include/linux/tables.h
By the way, please refer to iPXE as gPXE is a dead project.
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/