Re: [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies

From: Andy Lutomirski
Date: Fri Feb 19 2016 - 11:41:40 EST


On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> Any failure on the x86 init path can be catastrophic.
> A simple shift of a call from one place to another can
> easily break things. Likewise adding a new call to
> one path without considering all x86 requirements
> can make certain x86 run time environments crash.
> We currently account for these requirements through
> peer code review and run time testing. We could do
> much better if we had a clean and simple way to annotate
> strong semantics for run time requirements, init sequence
> dependencies, and detection mechanisms for additions of
> new x86 init sequences.

Please document this in a way that will be useful for people trying to
understand what the code does as opposed to just people who are trying
to understand why you wrote it. More below.



> +/**
> + * struct x86_init_fn - x86 generic kernel init call
> + *
> + * Linux x86 features vary in complexity, features may require work done at
> + * different levels of the full x86 init sequence. Today there are also two
> + * different possible entry points for Linux on x86, one for bare metal, KVM
> + * and Xen HVM, and another for Xen PV guests / dom0. Assuming a bootloader
> + * has set up 64-bit mode, roughly the x86 init sequence follows this path:
> + *
> + * Bare metal, KVM, Xen HVM Xen PV / dom0
> + * startup_64() startup_xen()
> + * \ /
> + * x86_64_start_kernel() xen_start_kernel()
> + * \ /
> + * x86_64_start_reservations()
> + * |
> + * start_kernel()
> + * [ ... ]
> + * [ setup_arch() ]
> + * [ ... ]
> + * init
> + *


I don't think this paragraph below is necessary. I also think it's
very confusing. Keep in mind that the reader has no idea what a
"level" is at this point and that the reader also doesn't need to
think about terms like "paravirtualization yielding".

> + * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
> + * entry starting points. The different entry points exist to enable Xen to
> + * skip a lot of hardware setup already done and managed on behalf of the
> + * hypervisor, we refer to this as "paravirtualization yielding". The different
> + * levels of init calls on the x86 init sequence exist to account for these
> + * slight differences and requirements. These different entry points also share
> + * a common entry x86 specific path, x86_64_start_reservations().
> + *

And here, I don't even know what a "feature" is.

> + * A generic x86 feature can have different initialization calls, one on each
> + * of the different main x86 init sequences, but must also address both entry
> + * points in order to work properly across the board on all supported x86
> + * subarchitectures. Since x86 features can also have dependencies on other
> + * setup code or features, x86 features can at times be subordinate to other
> + * x86 features, or conditions. struct x86_init_fn enables feature developers
> + * to annotate dependency relationships to ensure subsequent init calls only
> + * run once a subordinate's dependencies have run. When needed custom
> + * dependency requirements can also be spelled out through a custom dependency
> + * checker. In order to account for the dual entry point nature of x86-64 Linux
> + * for "paravirtualization yielding" and to make annotations for support for
> + * these explicit each struct x86_init_fn must specify supported
> + * subarchitectures. The earliest x86-64 code can read the subarchitecture
> + * though is after load_idt(), as such the earliest we can currently rely on
> + * subarchitecture for semantics and a common init sequences is on the shared
> + * common x86_64_start_reservations(). Each struct x86_init_fn must also
> + * declare a two-digit decimal number to impose an ordering relative to other
> + * features when required.

I'm totally lost in the paragraph above.


> + *
> + * x86_init_fn enables strong semantics and dependencies to be defined and
> + * implemented on the full x86 initialization sequence.

Please try explaining what this is, instead. For example:

An x86_init_fn represents a function to be called at a certain point
during initialization on a specific set of subarchitectures.

> + *
> + * @order_level: must be set, linker order level, this corresponds to the table
> + * section sub-table index, we record this only for semantic validation
> + * purposes.

I read this as "this is purely a debugging feature". Is it? I think it's not.

> Order-level is always required however you typically would
> + * only use X86_INIT_NORMAL*() and leave ordering to be done by placement
> + * of code in a C file and the order of objects through a Makefile. Custom
> + * order-levels can be used when order on C file and order of objects on
> + * Makfiles does not suffice or much further refinements are needed.


Assuming I understand this correctly, here's how I'd write it:

@order_level: The temporal order of this x86_init_fn. Lower
order_level numbers are called first. Ties are broken by order found
in the Makefile and then by order in the C file.

Note, however, that my proposed explanation can't be right because it
appears to conflict with "depend". Please adjust accordingly.

> + * @supp_hardware_subarch: must be set, it represents the bitmask of supported
> + * subarchitectures. We require each struct x86_init_fn to have this set
> + * to require developer considerations for each supported x86
> + * subarchitecture and to build strong annotations of different possible
> + * run time states particularly in consideration for the two main
> + * different entry points for x86 Linux, to account for paravirtualization
> + * yielding.
'
Too much motivation, too little documentation.

@supp_hardware_subarch: A bitmask of subarchitectures on which to call
this init function.

--- start big deletion ---

> + *
> + * The subarchitecture is read by the kernel at early boot from the
> + * struct boot_params hardware_subarch. Support for the subarchitecture
> + * exists as of x86 boot protocol 2.07. The bootloader would have set up
> + * the respective hardware_subarch on the boot sector as per
> + * Documentation/x86/boot.txt.
> + *
> + * What x86 entry point is used is determined at run time by the
> + * bootloader. Linux pv_ops was designed to help enable to build one Linux
> + * binary to support bare metal and different hypervisors. pv_ops setup
> + * code however is limited in that all pv_ops setup code is run late in
> + * the x86 init sequence, during setup_arch(). In fact cpu_has_hypervisor
> + * only works after early_cpu_init() during setup_arch(). If an x86
> + * feature requires an earlier determination of what hypervisor was used,
> + * or if it needs to annotate only support for certain hypervisors, the
> + * x86 hardware_subarch should be set by the bootloader and
> + * @supp_hardware_subarch set by the x86 feature. Using hardware_subarch
> + * enables x86 features to fill the semantic gap between the Linux x86
> + * entry point used and what pv_ops has to offer through a hypervisor
> + * agnostic mechanism.

--- end big deletion ---

> + *
> + * Each supported subarchitecture is set using the respective
> + * X86_SUBARCH_* as a bit in the bitmask. For instance if a feature
> + * is supported on PC and Xen subarchitectures only you would set this
> + * bitmask to:
> + *
> + * BIT(X86_SUBARCH_PC) |
> + * BIT(X86_SUBARCH_XEN);

I like this part, but how about "For instance, if an init function
should be called on PC and Xen subarchitectures only, you would set
the bitmask to..."?

> + *
> + * @detect: optional, if set returns true if the feature has been detected to
> + * be required, it returns false if the feature has been detected to not
> + * be required.

I have absolutely no idea what this means.

> + * @depend: optional, if set this set of init routines must be called prior to
> + * the init routine who's respective detect routine we have set this
> + * depends callback to. This is only used for sorting purposes given
> + * all current init callbacks have a void return type. Sorting is
> + * implemented via x86_init_fn_sort(), it must be called only once,
> + * however you can delay sorting until you need it if you can ensure
> + * only @order_level and @supp_hardware_subarch can account for proper
> + * ordering and dependency requirements for all init sequences prior.
> + * If you do not have a depend callback set its assumed the order level
> + * (__x86_init_fn(level)) set by the init routine suffices to set the
> + * order for when the feature's respective callbacks are called with
> + * respect to other calls. Sorting of init calls with the same order level
> + * is determined by linker order, determined by order placement on C code
> + * and order listed on a Makefile. A routine that depends on another is
> + * known as being subordinate to the init routine it depends on. Routines
> + * that are subordinate must have an order-level of lower priority or
> + * equal priority than the order-level of the init sequence it depends on.

I don't understand this at all. I assume you're saying that some kind
of topological sorting happens. This leads to a question: why is
"depend" a function pointer? Shouldn't it be x86_init_fn*? Also,
what happens if you depend on something that is disabled on the
running subarch? And what happens if the depend-implied order is
inconsistent with order_level.

I would hope that order_level breaks ties in the topological sort and
that there's a bit fat warning if the order_level ordering is
inconsistent with the topological ordering. Preferably the warning
would be at build time, in which case it could be an error.

> + * @early_init: required, routine which will run in x86_64_start_reservations()
> + * after we ensure boot_params.hdr.hardware_subarch is accessible and
> + * properly set. Memory is not yet available. This the earliest we can
> + * currently define a common shared callback since all callbacks need to
> + * check for boot_params.hdr.hardware_subarch and this becomes accessible
> + * on x86-64 until after load_idt().

What's this for? Under what conditions is it called? What order?
Why is it part of x86_init_fn at all?

> + * @flags: optional, bitmask of enum x86_init_fn_flags

What are these flags? What do they do?

> + */
> +struct x86_init_fn {
> + __u32 order_level;
> + __u32 supp_hardware_subarch;
> + bool (*detect)(void);
> + bool (*depend)(void);
> + void (*early_init)(void);
> + __u32 flags;
> +};
> +
> +/**
> + * enum x86_init_fn_flags: flags for init sequences
> + *
> + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
> + * has completed it can break out of the loop for init sequences on
> + * its own level.

What does this mean?

> + * X86_INIT_DETECTED: private flag. Used by the x86 core to annotate that this
> + * init sequence has been detected and it all of its callbacks
> + * must be run during initialization.

Please make this an entry in a new field scratch_space or similar and
just document the entire field as "private to the init core code".

> +static struct x86_init_fn *x86_init_fn_find_dep(struct x86_init_fn *start,
> + struct x86_init_fn *finish,
> + struct x86_init_fn *q)
> +{
> + struct x86_init_fn *p;
> +
> + if (!q)
> + return NULL;
> +
> + for (p = start; p < finish; p++)
> + if (p->detect == q->depend)
> + return p;

That's very strange indeed, and it doesn't seem consistent with my
explanation. Please fix up the docs to explain what's going on.
Again, as a reviewer and eventual user of this code, I do not need to
know what historical problem it solves. I need to know what the code
does and how to use it.

> +
> +void __ref x86_init_fn_init_tables(void)

Why is this __ref and not __init?

--Andy