[RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies

From: Luis R. Rodriguez
Date: Fri Feb 19 2016 - 09:16:18 EST


Any failure on the x86 init path can be catastrophic.
A simple shift of a call from one place to another can
easily break things. Likewise adding a new call to
one path without considering all x86 requirements
can make certain x86 run time environments crash.
We currently account for these requirements through
peer code review and run time testing. We could do
much better if we had a clean and simple way to annotate
strong semantics for run time requirements, init sequence
dependencies, and detection mechanisms for additions of
new x86 init sequences.

This adds support for linker tables for x86 in order
to help define strong semantics for x86 init sequences
and as collateral help simplify the x86 init sequence.
The defined struct x86_init_fn is inspired by both
iPXE's sample definitions but is also also heavily
inspired by Linux's own IOMMU initialiation solution
which enables to extend initialization semantics to
support init routine custom detection routines, and
enables adding annotations for init routine dependency
mapping. A newly featured solution in this design
is to build up on the hardware subarchitecture
added to the x86 boot protocol 2.07 in 2007 by Rusty
but never really taken advantage of except for lguest,
using it as a stop-gap for new x86 features which
have failed to take into consideration the dual x86-64
entry points possible due to paravirtualization yielding
requirements. The hardware subarchitecture could also
potentially be used in the future to help unify Linux x86
entry points. The current disjoint entry points for x86-64 can
be summarized as follows.

Bare metal, KVM, Xen HVM Xen PV / dom0
startup_64() startup_xen()
\ /
x86_64_start_kernel() xen_start_kernel()
\ /
x86_64_start_reservations()
|
start_kernel()
[ ... ]
[ setup_arch() ]
[ ... ]
init

Inherited by the nature of using linker tables we also
gain the ability to sort init sequences using the linker
through the use of order-level and linker SORT() and only
optionally enable use specific init sequence sorting when
init seqeuence dependencies semantics are needed. By using
linker tables we can also avoid #ifdefer'y on select code
when and if desirable, this is also completley optional but
it is a feature we inherit by using linker tables.

For *new x86 features* this enables strong semantics to be
considered. For *existing x86 feature code* this enables the
opportunity to clarify requirements and dependencies,
which ultimately also provides a new way to ensure that code
that should not run never runs, that is, it provides one
mechanism to help prevent dead code. For a more elaborate
description on what dead code is exactly and how it can relate
to init sequences in particular to Xen due to its init sequence
refer to [0] and [1].

Four debug x86 features are included to help test x86 linker
table support. You'll want CONFIG_X86_DEBUG_LINKER_TABLES to
enable those. The features defined with table-$(CONFIG-x)
will *always* be compiled but only linked in when enabled.
The features defined with the good 'ol obj-y will only be
compiled as we're used to. If CONFIG_X86_DEBUG_LINKER_TABLES is
disabled nothing currently runs as there are no init sequences
yet using any of this.

The goal is to evolve the semantics carefully as needed. We start
with one basic callback, an early_init() called at the beginning
of x86_64_start_reservations().

v2:

- port to new linker tables as discussed
- since the start and end of a table are now the empty string and
"~" string, we can use any digit range for our order levels.
Because of this bump the order level ranges for x86 from 01.99
to 0000-9999. We can arbitrarilly change this later, but this also
gives us a lot of leg room for easy adjustments later.
- since we are now using the basic .init section we have coverge
support for modpost to check section mismatch issues for us! This
also means we had to peg __ref on a few of our callers which used
.init section code.
- Since we're using the standard .init section we can drop now our
custom modification of both arch/x86/tools/relocs.c and
arch/x86/kernel/vmlinux.lds.S ! To be clear we don't need any
further linker script hacks now when making use of linker tables.

[0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
[1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html

Signed-off-by: Luis R. Rodriguez <mcgrof@xxxxxxxxxx>
---
arch/x86/Kconfig.debug | 47 +++++++
arch/x86/include/asm/x86_init.h | 1 +
arch/x86/include/asm/x86_init_fn.h | 263 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/dbg-tables/Makefile | 18 +++
arch/x86/kernel/dbg-tables/alpha.c | 10 ++
arch/x86/kernel/dbg-tables/beta.c | 18 +++
arch/x86/kernel/dbg-tables/delta.c | 10 ++
arch/x86/kernel/dbg-tables/gamma.c | 18 +++
arch/x86/kernel/dbg-tables/gamma.h | 3 +
arch/x86/kernel/head32.c | 4 +
arch/x86/kernel/head64.c | 4 +
arch/x86/kernel/init.c | 55 ++++++++
arch/x86/kernel/sort-init.c | 114 ++++++++++++++++
14 files changed, 567 insertions(+)
create mode 100644 arch/x86/include/asm/x86_init_fn.h
create mode 100644 arch/x86/kernel/dbg-tables/Makefile
create mode 100644 arch/x86/kernel/dbg-tables/alpha.c
create mode 100644 arch/x86/kernel/dbg-tables/beta.c
create mode 100644 arch/x86/kernel/dbg-tables/delta.c
create mode 100644 arch/x86/kernel/dbg-tables/gamma.c
create mode 100644 arch/x86/kernel/dbg-tables/gamma.h
create mode 100644 arch/x86/kernel/init.c
create mode 100644 arch/x86/kernel/sort-init.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 9b18ed97a8a2..af5582d33dd8 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -383,4 +383,51 @@ config PUNIT_ATOM_DEBUG
The current power state can be read from
/sys/kernel/debug/punit_atom/dev_power_state

+config X86_DEBUG_LINKER_TABLES
+ bool "x86 linker table debug"
+ depends on DEBUG_KERNEL
+ default n
+ ---help---
+ Enabling this should enable debugging linker tables on x86
+ with its own solution. To help debug it enables the "x86 beta"
+ and "x86 gamma" built-in feature. Both "x86 alpha" and "x86 delta"
+ built-in features are left disabled. You should see all tables
+ compiled, except delta, only the beta and gamma feature will be
+ linked in. The delta feature will only be compiled and linked in
+ if and only if its enabled the old way.
+
+ For more details on these linker table refer to:
+
+ include/linux/tables.h
+
+ If unsure, say N.
+
+config X86_DEBUG_LINKER_TABLE_ALPHA
+ bool "x86 linker table alpha"
+ default n
+ depends on X86_DEBUG_LINKER_TABLES
+ ---help---
+ Enabling this should enable the linker table "x86 alpha" feature.
+
+config X86_DEBUG_LINKER_TABLE_BETA
+ bool "x86 linker table beta"
+ default y
+ depends on X86_DEBUG_LINKER_TABLES
+ ---help---
+ Enabling this should enable the linker table "x86 beta" feature.
+
+config X86_DEBUG_LINKER_TABLE_GAMMA
+ bool "x86 linker table gamma"
+ default y
+ depends on X86_DEBUG_LINKER_TABLES
+ ---help---
+ Enabling this should enable the linker table "x86 gamma" feature.
+
+config X86_DEBUG_LINKER_TABLE_DELTA
+ bool "x86 linker table delta"
+ default n
+ depends on X86_DEBUG_LINKER_TABLES
+ ---help---
+ Enabling this should enable the linker table "x86 delta" feature.
+
endmenu
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 1ae89a2721d6..0df68c814147 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -2,6 +2,7 @@
#define _ASM_X86_PLATFORM_H

#include <asm/bootparam.h>
+#include <asm/x86_init_fn.h>

struct mpc_bus;
struct mpc_cpu;
diff --git a/arch/x86/include/asm/x86_init_fn.h b/arch/x86/include/asm/x86_init_fn.h
new file mode 100644
index 000000000000..1521a546c734
--- /dev/null
+++ b/arch/x86/include/asm/x86_init_fn.h
@@ -0,0 +1,263 @@
+#ifndef __X86_INIT_TABLES_H
+#define __X86_INIT_TABLES_H
+
+#include <linux/types.h>
+#include <linux/tables.h>
+#include <linux/init.h>
+#include <linux/bitops.h>
+
+/**
+ * struct x86_init_fn - x86 generic kernel init call
+ *
+ * Linux x86 features vary in complexity, features may require work done at
+ * different levels of the full x86 init sequence. Today there are also two
+ * different possible entry points for Linux on x86, one for bare metal, KVM
+ * and Xen HVM, and another for Xen PV guests / dom0. Assuming a bootloader
+ * has set up 64-bit mode, roughly the x86 init sequence follows this path:
+ *
+ * Bare metal, KVM, Xen HVM Xen PV / dom0
+ * startup_64() startup_xen()
+ * \ /
+ * x86_64_start_kernel() xen_start_kernel()
+ * \ /
+ * x86_64_start_reservations()
+ * |
+ * start_kernel()
+ * [ ... ]
+ * [ setup_arch() ]
+ * [ ... ]
+ * init
+ *
+ * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
+ * entry starting points. The different entry points exist to enable Xen to
+ * skip a lot of hardware setup already done and managed on behalf of the
+ * hypervisor, we refer to this as "paravirtualization yielding". The different
+ * levels of init calls on the x86 init sequence exist to account for these
+ * slight differences and requirements. These different entry points also share
+ * a common entry x86 specific path, x86_64_start_reservations().
+ *
+ * A generic x86 feature can have different initialization calls, one on each
+ * of the different main x86 init sequences, but must also address both entry
+ * points in order to work properly across the board on all supported x86
+ * subarchitectures. Since x86 features can also have dependencies on other
+ * setup code or features, x86 features can at times be subordinate to other
+ * x86 features, or conditions. struct x86_init_fn enables feature developers
+ * to annotate dependency relationships to ensure subsequent init calls only
+ * run once a subordinate's dependencies have run. When needed custom
+ * dependency requirements can also be spelled out through a custom dependency
+ * checker. In order to account for the dual entry point nature of x86-64 Linux
+ * for "paravirtualization yielding" and to make annotations for support for
+ * these explicit each struct x86_init_fn must specify supported
+ * subarchitectures. The earliest x86-64 code can read the subarchitecture
+ * though is after load_idt(), as such the earliest we can currently rely on
+ * subarchitecture for semantics and a common init sequences is on the shared
+ * common x86_64_start_reservations(). Each struct x86_init_fn must also
+ * declare a two-digit decimal number to impose an ordering relative to other
+ * features when required.
+ *
+ * x86_init_fn enables strong semantics and dependencies to be defined and
+ * implemented on the full x86 initialization sequence.
+ *
+ * @order_level: must be set, linker order level, this corresponds to the table
+ * section sub-table index, we record this only for semantic validation
+ * purposes. Order-level is always required however you typically would
+ * only use X86_INIT_NORMAL*() and leave ordering to be done by placement
+ * of code in a C file and the order of objects through a Makefile. Custom
+ * order-levels can be used when order on C file and order of objects on
+ * Makfiles does not suffice or much further refinements are needed.
+ * @supp_hardware_subarch: must be set, it represents the bitmask of supported
+ * subarchitectures. We require each struct x86_init_fn to have this set
+ * to require developer considerations for each supported x86
+ * subarchitecture and to build strong annotations of different possible
+ * run time states particularly in consideration for the two main
+ * different entry points for x86 Linux, to account for paravirtualization
+ * yielding.
+ *
+ * The subarchitecture is read by the kernel at early boot from the
+ * struct boot_params hardware_subarch. Support for the subarchitecture
+ * exists as of x86 boot protocol 2.07. The bootloader would have set up
+ * the respective hardware_subarch on the boot sector as per
+ * Documentation/x86/boot.txt.
+ *
+ * What x86 entry point is used is determined at run time by the
+ * bootloader. Linux pv_ops was designed to help enable to build one Linux
+ * binary to support bare metal and different hypervisors. pv_ops setup
+ * code however is limited in that all pv_ops setup code is run late in
+ * the x86 init sequence, during setup_arch(). In fact cpu_has_hypervisor
+ * only works after early_cpu_init() during setup_arch(). If an x86
+ * feature requires an earlier determination of what hypervisor was used,
+ * or if it needs to annotate only support for certain hypervisors, the
+ * x86 hardware_subarch should be set by the bootloader and
+ * @supp_hardware_subarch set by the x86 feature. Using hardware_subarch
+ * enables x86 features to fill the semantic gap between the Linux x86
+ * entry point used and what pv_ops has to offer through a hypervisor
+ * agnostic mechanism.
+ *
+ * Each supported subarchitecture is set using the respective
+ * X86_SUBARCH_* as a bit in the bitmask. For instance if a feature
+ * is supported on PC and Xen subarchitectures only you would set this
+ * bitmask to:
+ *
+ * BIT(X86_SUBARCH_PC) |
+ * BIT(X86_SUBARCH_XEN);
+ *
+ * @detect: optional, if set returns true if the feature has been detected to
+ * be required, it returns false if the feature has been detected to not
+ * be required.
+ * @depend: optional, if set this set of init routines must be called prior to
+ * the init routine who's respective detect routine we have set this
+ * depends callback to. This is only used for sorting purposes given
+ * all current init callbacks have a void return type. Sorting is
+ * implemented via x86_init_fn_sort(), it must be called only once,
+ * however you can delay sorting until you need it if you can ensure
+ * only @order_level and @supp_hardware_subarch can account for proper
+ * ordering and dependency requirements for all init sequences prior.
+ * If you do not have a depend callback set its assumed the order level
+ * (__x86_init_fn(level)) set by the init routine suffices to set the
+ * order for when the feature's respective callbacks are called with
+ * respect to other calls. Sorting of init calls with the same order level
+ * is determined by linker order, determined by order placement on C code
+ * and order listed on a Makefile. A routine that depends on another is
+ * known as being subordinate to the init routine it depends on. Routines
+ * that are subordinate must have an order-level of lower priority or
+ * equal priority than the order-level of the init sequence it depends on.
+ * @early_init: required, routine which will run in x86_64_start_reservations()
+ * after we ensure boot_params.hdr.hardware_subarch is accessible and
+ * properly set. Memory is not yet available. This the earliest we can
+ * currently define a common shared callback since all callbacks need to
+ * check for boot_params.hdr.hardware_subarch and this becomes accessible
+ * on x86-64 until after load_idt().
+ * @flags: optional, bitmask of enum x86_init_fn_flags
+ */
+struct x86_init_fn {
+ __u32 order_level;
+ __u32 supp_hardware_subarch;
+ bool (*detect)(void);
+ bool (*depend)(void);
+ void (*early_init)(void);
+ __u32 flags;
+};
+
+/**
+ * enum x86_init_fn_flags: flags for init sequences
+ *
+ * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
+ * has completed it can break out of the loop for init sequences on
+ * its own level.
+ * X86_INIT_DETECTED: private flag. Used by the x86 core to annotate that this
+ * init sequence has been detected and it all of its callbacks
+ * must be run during initialization.
+ */
+enum x86_init_fn_flags {
+ X86_INIT_FINISH_IF_DETECTED = BIT(0),
+ X86_INIT_DETECTED = BIT(1),
+};
+
+DECLARE_LINKTABLE_INIT(struct x86_init_fn, x86_init_fns);
+
+/* Init order levels, we can start at 0000 but reserve 0000-0999 for now */
+#define X86_INIT_ORDER_EARLY 1000
+#define X86_INIT_ORDER_NORMAL 3000
+#define X86_INIT_ORDER_LATE 5000
+
+/*
+ * Use LTO_REFERENCE_INITCALL just in case of issues with old versions of gcc.
+ * This might not be needed for linker tables due to how we compartamentalize
+ * sections and then order them at linker time, but just in case.
+ */
+
+#define x86_init(__level, \
+ __supp_hardware_subarch, \
+ __detect, \
+ __depend, \
+ __early_init) \
+ static LINKTABLE_INIT(x86_init_fns, __level) \
+ __x86_init_fn_##__early_init2 = { \
+ .order_level = __level, \
+ .supp_hardware_subarch = __supp_hardware_subarch, \
+ .detect = __detect, \
+ .depend = __depend, \
+ .early_init = __early_init, \
+ }; \
+ LTO_REFERENCE_INITCALL(__x86_init_fn_##__early_init);
+
+#define x86_init_early(__supp_hardware_subarch, \
+ __detect, \
+ __depend, \
+ __early_init) \
+ x86_init(X86_INIT_ORDER_EARLY, __supp_hardware_subarch, \
+ __detect, __depend, \
+ __early_init);
+
+#define x86_init_normal(__supp_hardware_subarch, \
+ __detect, \
+ __depend, \
+ __early_init) \
+ x86_init(__name, X86_INIT_ORDER_NORMAL, __supp_hardware_subarch,\
+ __detect, __depend, \
+ __early_init);
+
+#define x86_init_early_all(__detect, \
+ __depend, \
+ __early_init) \
+ x86_init_early(__name, X86_SUBARCH_ALL_SUBARCHS, \
+ __detect, __depend, \
+ __early_init);
+
+#define x86_init_early_pc(__detect, \
+ __depend, \
+ __early_init) \
+ x86_init_early(BIT(X86_SUBARCH_PC), \
+ __detect, __depend, \
+ __early_init);
+
+#define x86_init_early_pc_simple(__early_init) \
+ x86_init_early((BIT(X86_SUBARCH_PC)), NULL, NULL, \
+ __early_init);
+
+#define x86_init_normal_all(__detect, \
+ __depend, \
+ __early_init) \
+ x86_init_normal(X86_SUBARCH_ALL_SUBARCHS, \
+ __detect, __depend, \
+ __early_init);
+
+#define x86_init_normal_pc(__detect, \
+ __depend, \
+ __early_init) \
+ x86_init_normal((BIT(X86_SUBARCH_PC)), \
+ __detect, __depend, \
+ __early_init);
+
+
+#define x86_init_normal_xen(__detect, \
+ __depend, \
+ __early_init) \
+ x86_init_normal((BIT(X86_SUBARCH_XEN)), \
+ __detect, __depend, \
+ __early_init);
+
+/**
+ * x86_init_fn_early_init: call all early_init() callbacks
+ *
+ * This calls all early_init() callbacks on the x86_init_fns linker table.
+ */
+void x86_init_fn_early_init(void);
+
+/**
+ * x86_init_fn_init_tables - sort and check x86 linker table
+ *
+ * This sorts struct x86_init_fn init sequences in the x86_init_fns linker
+ * table by ensuring that init sequences that depend on other init sequences
+ * are placed later in the linker table. Init sequences that do not have
+ * dependencies are left in place. Circular dependencies are not allowed.
+ * The order-level of subordinate init sequences, that is of init sequences
+ * that depend on other init sequences, must have an order-level of lower
+ * or equal priority to the init sequence it depends on.
+ *
+ * This also validates semantics of all struct x86_init_fn init sequences
+ * on the x86_init_fns linker table.
+ */
+void x86_init_fn_init_tables(void);
+
+#endif /* __X86_INIT_TABLES_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b1b78ffe01d0..be167a0a5e2c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -23,6 +23,8 @@ KASAN_SANITIZE_dumpstack_$(BITS).o := n
CFLAGS_irq.o := -I$(src)/../include/asm/trace

obj-y := process_$(BITS).o signal.o
+obj-y += init.o sort-init.o
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLES) += dbg-tables/
obj-$(CONFIG_COMPAT) += signal_compat.o
obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
obj-y += time.o ioport.o dumpstack.o nmi.o
diff --git a/arch/x86/kernel/dbg-tables/Makefile b/arch/x86/kernel/dbg-tables/Makefile
new file mode 100644
index 000000000000..02d12c502ad0
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/Makefile
@@ -0,0 +1,18 @@
+# You should see all these compile but at run time you'd
+# only see the ones that were linked in.
+
+# This ensures we always compile but only link
+# in alpha if it was enabled. This is typically what
+# you are expected to use to avoid code bit-rot.
+table-$(CONFIG_X86_DEBUG_LINKER_TABLE_ALPHA) += alpha.o
+
+# This accomplishes the same, but requires 2 lines.
+extra-y += beta.o
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLE_BETA) += beta.o
+
+table-$(CONFIG_X86_DEBUG_LINKER_TABLE_GAMMA) += gamma.o
+
+# If you *know* you only want to enable compilation of
+# a feature when its selected you can just use the good
+# ol' obj
+obj-$(CONFIG_X86_DEBUG_LINKER_TABLE_DELTA) += delta.o
diff --git a/arch/x86/kernel/dbg-tables/alpha.c b/arch/x86/kernel/dbg-tables/alpha.c
new file mode 100644
index 000000000000..54754f893a08
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/alpha.c
@@ -0,0 +1,10 @@
+#define pr_fmt(fmt) "debug-alpha: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+static void early_init_dbg_alpha(void) {
+ pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc_simple(early_init_dbg_alpha);
diff --git a/arch/x86/kernel/dbg-tables/beta.c b/arch/x86/kernel/dbg-tables/beta.c
new file mode 100644
index 000000000000..7384a57fc386
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/beta.c
@@ -0,0 +1,18 @@
+#define pr_fmt(fmt) "debug-beta: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+#include "gamma.h"
+
+static bool x86_dbg_detect_beta(void)
+{
+ return true;
+}
+
+static void early_init_dbg_beta(void) {
+ pr_info("early_init triggered\n");
+}
+x86_init_early_pc(x86_dbg_detect_beta,
+ x86_dbg_detect_gamma,
+ early_init_dbg_beta);
diff --git a/arch/x86/kernel/dbg-tables/delta.c b/arch/x86/kernel/dbg-tables/delta.c
new file mode 100644
index 000000000000..9d38c68e602a
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/delta.c
@@ -0,0 +1,10 @@
+#define pr_fmt(fmt) "debug-delta: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+static void early_init_dbg_delta(void) {
+ pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc_simple(early_init_dbg_delta);
diff --git a/arch/x86/kernel/dbg-tables/gamma.c b/arch/x86/kernel/dbg-tables/gamma.c
new file mode 100644
index 000000000000..7b663c1f08f4
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/gamma.c
@@ -0,0 +1,18 @@
+#define pr_fmt(fmt) "debug-gamma: " fmt
+
+#include <linux/kernel.h>
+#include <asm/x86_init.h>
+
+bool x86_dbg_detect_gamma(void)
+{
+ return true;
+}
+
+static void early_init_dbg_gamma(void)
+{
+ pr_info("early_init triggered\n");
+}
+
+x86_init_early_pc(x86_dbg_detect_gamma,
+ NULL,
+ early_init_dbg_gamma);
diff --git a/arch/x86/kernel/dbg-tables/gamma.h b/arch/x86/kernel/dbg-tables/gamma.h
new file mode 100644
index 000000000000..810d450ddd14
--- /dev/null
+++ b/arch/x86/kernel/dbg-tables/gamma.h
@@ -0,0 +1,3 @@
+#include <asm/x86_init.h>
+
+bool x86_dbg_detect_gamma(void);
diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 2911ef3a9f1c..d93f3e42e61b 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -19,6 +19,7 @@
#include <asm/bios_ebda.h>
#include <asm/tlbflush.h>
#include <asm/bootparam_utils.h>
+#include <asm/x86_init.h>

static void __init i386_default_early_setup(void)
{
@@ -47,5 +48,8 @@ asmlinkage __visible void __init i386_start_kernel(void)
break;
}

+ x86_init_fn_init_tables();
+ x86_init_fn_early_init();
+
start_kernel();
}
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2c0f3407bd1f..a3b56aecaeda 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -28,6 +28,7 @@
#include <asm/bootparam_utils.h>
#include <asm/microcode.h>
#include <asm/kasan.h>
+#include <asm/x86_init.h>

/*
* Manage page tables very early on.
@@ -190,6 +191,9 @@ void __init x86_64_start_reservations(char *real_mode_data)
if (!boot_params.hdr.version)
copy_bootdata(__va(real_mode_data));

+ x86_init_fn_init_tables();
+ x86_init_fn_early_init();
+
reserve_ebda_region();

switch (boot_params.hdr.hardware_subarch) {
diff --git a/arch/x86/kernel/init.c b/arch/x86/kernel/init.c
new file mode 100644
index 000000000000..b012c59f11dc
--- /dev/null
+++ b/arch/x86/kernel/init.c
@@ -0,0 +1,55 @@
+#define pr_fmt(fmt) "x86-init: " fmt
+
+#include <linux/bug.h>
+#include <linux/kernel.h>
+
+#include <asm/x86_init_fn.h>
+#include <asm/bootparam.h>
+#include <asm/boot.h>
+#include <asm/setup.h>
+
+DEFINE_LINKTABLE_INIT(struct x86_init_fn, x86_init_fns);
+
+static bool x86_init_fn_supports_subarch(struct x86_init_fn *fn)
+{
+ if (!fn->supp_hardware_subarch) {
+ pr_err("Init sequence fails to declares any supported subarchs: %pF\n", fn->early_init);
+ WARN_ON(1);
+ }
+ if (BIT(boot_params.hdr.hardware_subarch) & fn->supp_hardware_subarch)
+ return true;
+ return false;
+}
+
+void __ref x86_init_fn_early_init(void)
+{
+ int ret;
+ struct x86_init_fn *init_fn;
+ unsigned int num_inits = LINKTABLE_SIZE(x86_init_fns);
+
+ if (!num_inits)
+ return;
+
+ pr_debug("Number of init entries: %d\n", num_inits);
+
+ LINKTABLE_FOR_EACH(init_fn, x86_init_fns) {
+ if (!x86_init_fn_supports_subarch(init_fn))
+ continue;
+ if (!init_fn->detect)
+ init_fn->flags |= X86_INIT_DETECTED;
+ else {
+ ret = init_fn->detect();
+ if (ret > 0)
+ init_fn->flags |= X86_INIT_DETECTED;
+ }
+
+ if (init_fn->flags & X86_INIT_DETECTED) {
+ init_fn->flags |= X86_INIT_DETECTED;
+ pr_debug("Running early init %pF ...\n", init_fn->early_init);
+ init_fn->early_init();
+ pr_debug("Completed early init %pF\n", init_fn->early_init);
+ if (init_fn->flags & X86_INIT_FINISH_IF_DETECTED)
+ break;
+ }
+ }
+}
diff --git a/arch/x86/kernel/sort-init.c b/arch/x86/kernel/sort-init.c
new file mode 100644
index 000000000000..c03669f6f9d6
--- /dev/null
+++ b/arch/x86/kernel/sort-init.c
@@ -0,0 +1,114 @@
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <asm/x86_init_fn.h>
+
+static struct x86_init_fn *x86_init_fn_find_dep(struct x86_init_fn *start,
+ struct x86_init_fn *finish,
+ struct x86_init_fn *q)
+{
+ struct x86_init_fn *p;
+
+ if (!q)
+ return NULL;
+
+ for (p = start; p < finish; p++)
+ if (p->detect == q->depend)
+ return p;
+
+ return NULL;
+}
+
+static void x86_init_fn_sort(struct x86_init_fn *start,
+ struct x86_init_fn *finish)
+{
+
+ struct x86_init_fn *p, *q, tmp;
+
+ for (p = start; p < finish; p++) {
+again:
+ q = x86_init_fn_find_dep(start, finish, p);
+ /*
+ * We are bit sneaky here. We use the memory address to figure
+ * out if the node we depend on is past our point, if so, swap.
+ */
+ if (q > p) {
+ tmp = *p;
+ memmove(p, q, sizeof(*p));
+ *q = tmp;
+ goto again;
+ }
+ }
+
+}
+
+static void x86_init_fn_check(struct x86_init_fn *start,
+ struct x86_init_fn *finish)
+{
+ struct x86_init_fn *p, *q, *x;
+
+ /* Simple cyclic dependency checker. */
+ for (p = start; p < finish; p++) {
+ if (!p->depend)
+ continue;
+ q = x86_init_fn_find_dep(start, finish, p);
+ x = x86_init_fn_find_dep(start, finish, q);
+ if (p == x) {
+ pr_info("CYCLIC DEPENDENCY FOUND! %pF depends on %pF and vice-versa. BREAKING IT.\n",
+ p->early_init, q->early_init);
+ /* Heavy handed way..*/
+ x->depend = 0;
+ }
+ }
+
+ /*
+ * Validate sorting semantics.
+ *
+ * p depends on q so:
+ * - q must run first, so q < p. If q > p that's an issue
+ * as its saying p must run prior to q. We already sorted
+ * this table, this is a problem.
+ *
+ * - q's order level must be <= than p's as it should run first
+ */
+ for (p = start; p < finish; p++) {
+ if (!p->depend)
+ continue;
+ /*
+ * Be pedantic and do a full search on the entire table,
+ * if we need further validation, after this is called
+ * one could use an optimized version which just searches
+ * on x86_init_fn_find_dep(p, finish, p), as we would have
+ * guarantee on proper ordering both at the dependency level
+ * and by order level.
+ */
+ q = x86_init_fn_find_dep(start, finish, p);
+ if (q && q > p) {
+ pr_info("EXECUTION ORDER INVALID! %pF should be called before %pF!\n",
+ p->early_init, q->early_init);
+ }
+
+ /*
+ * Technically this would still work as the memmove() would
+ * have forced the dependency to run first, however we want
+ * strong semantics, so lets avoid these.
+ */
+ if (q && q->order_level > p->order_level) {
+ pr_info("INVALID ORDER LEVEL! %pF should have an order level <= be called before %pF!\n",
+ p->early_init, q->early_init);
+ }
+ }
+}
+
+void __ref x86_init_fn_init_tables(void)
+{
+ unsigned int num_inits = LINKTABLE_SIZE(x86_init_fns);
+
+ if (!num_inits)
+ return;
+
+ x86_init_fn_sort(LINKTABLE_START(x86_init_fns),
+ LINKTABLE_END(x86_init_fns));
+ x86_init_fn_check(LINKTABLE_START(x86_init_fns),
+ LINKTABLE_END(x86_init_fns));
+}
--
2.7.0