Re: [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and annotate dependencies

From: Luis R. Rodriguez
Date: Fri Feb 26 2016 - 19:20:17 EST


On Fri, Feb 19, 2016 at 11:30:43PM -0800, Andy Lutomirski wrote:
> On Fri, Feb 19, 2016 at 4:42 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxx> wrote:
> > On Fri, Feb 19, 2016 at 08:40:49AM -0800, Andy Lutomirski wrote:
> >> On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> >> And here, I don't even know what a "feature" is.
> >
> > Kasan is an example.
>
> Is kasan different from the cr4 shadow in this context? I don't
> really feel like calling the cr4 shadow a "feature" is meaningful. Am
> I misunderstanding something?

Its an early boot call that we can share with the different boot entries,
or that each boot entry must call. Kasan is different in that is has a series
of requirements for each boot entry, it needs an early boot call, then one
at setup_arch(). In the context of how I'm implementing callbacks just as
an example, cr4 shadow would just have one callback, whereas kasan could
have an early_init() callback and a setup_arch() callback.

> >> > + * A generic x86 feature can have different initialization calls, one on each
> >> > + * of the different main x86 init sequences, but must also address both entry
> >> > + * points in order to work properly across the board on all supported x86
> >> > + * subarchitectures. Since x86 features can also have dependencies on other
> >> > + * setup code or features, x86 features can at times be subordinate to other
> >> > + * x86 features, or conditions. struct x86_init_fn enables feature developers
> >> > + * to annotate dependency relationships to ensure subsequent init calls only
> >> > + * run once a subordinate's dependencies have run. When needed custom
> >> > + * dependency requirements can also be spelled out through a custom dependency
> >> > + * checker. In order to account for the dual entry point nature of x86-64 Linux
> >> > + * for "paravirtualization yielding" and to make annotations for support for
> >> > + * these explicit each struct x86_init_fn must specify supported
> >> > + * subarchitectures. The earliest x86-64 code can read the subarchitecture
> >> > + * though is after load_idt(), as such the earliest we can currently rely on
> >> > + * subarchitecture for semantics and a common init sequences is on the shared
> >> > + * common x86_64_start_reservations(). Each struct x86_init_fn must also
> >> > + * declare a two-digit decimal number to impose an ordering relative to other
> >> > + * features when required.
> >>
> >> I'm totally lost in the paragraph above.
> >
> > As an example Kasan could be defined as a struct x86_init_fn with a respective
> > callback on x86_64_start_kernel() (kasan_early_init()) and later setup_arch()
> > (kasan_init()). Since we can't support for struct x86_init_fn calls to start all
> > the way from x86_64_start_kernel() (given the issues I had noted before)
> > the earliest a struct x86_init_fn can be used for is for a callback on
> > x86_64_start_reservations().
>
> I'm still lost. Kasan is a pile of code that does something. Kasan
> has some initialization code. Do you mean that kasan's initialization
> code could be encapsulated in one or more x86_init_fn instances?

Yup.

Kasan has for instance:

arch/x86/kernel/head64.c:

asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
{
...
kasan_early_init()
...
}

Then on arch/x86/kernel/setup.c:

void __init setup_arch(char **cmdline_p)
{
...
kasan_init();
...
}

With x86_init_fn we'd have something like within arch/x86/mm/kasan_init_64.c:

static LINKTABLE_INIT_DATA(x86_init_fns, X86_INIT_ORDER_EARLY)
__x86_init_fn_kasan_early_init = {
.order_level = X86_INIT_ORDER_EARLY,
.supp_hardware_subarch = BIT(X86_SUBARCH_PC),
.early_init = kasan_early_init,
.setup_arch = kasan_init,
};

Only we'd wrap this in a macro wrapper to make it look something like:

x86_init_early_pc_setup(kasan_early_init, kasan_init);

With that in place we can remove the explicit kasan_init() call on setup.c
and kasan_early_init() in head64.c as x86_64_start_kernel() would have
a single x86_init_fn_early_init(); and setup_arch() would have something
like x86_init_fn_setup_arch() -- each of those would iterate over the
x86_init_fns linker table, and call their respective callback. Please
note though that since subarch is currently only readable until after
load_idt() it means though that the we can't fold kasan into this
framework yet, its why I placed x86_init_fn_early_init() until
x86_64_start_reservations().

The proactive measure here would be that if a developer were to add a feature
similar in nature to Kasan they'd have to think about what subarch they
support and vet / ensure they work, and if they can't get it to work at
least they'd need to design it in a way that enables it to be disabled
at run time. Right now Kasan cannot be disabled at run time.

> > It just means we can different callbacks used for different parts of the
> > x86 init process, it explains the limits to the callbacks we can have right now.
>
> Do you mean that there would be different linker tables containing
> x86_init_fn instances for each of these callback points? Or is
> "early_init" the only such callback point.

No we'd have only one linker table: x86_init_fns, and in it would be
structs of type struct x86_init_fn. We can iterate over the linker
the table anywhere, in our case since we're using it to help with
the init process, and since x86 features can have calls in different
parts of the x86 init process we can just iterate over the linker table
during these difference key places and have it call the needed callback
for each place.

So x86_init_fn_early_init() would call early_init(). Its the only callback
we have right now. Later, we may add a setup_arch() callback and then
place a x86_init_fn_setup_arch() on setup_arch() to remove clutter there.

> BTW, why is early_init called early_init? Why not call it "callback"?
> Then you could reuse this for other things.

Because its trying to make it clear where in the x86 init path its going
to be called, in this case today the earliest I can get subarch to be
read is x86_64_start_reservations().

> >> > + * @detect: optional, if set returns true if the feature has been detected to
> >> > + * be required, it returns false if the feature has been detected to not
> >> > + * be required.
> >>
> >> I have absolutely no idea what this means.
> >
> > Well, so the order level stuff is used at link time, there however are some
> > run time environment conditions that might be important to consider as well,
> > the detect lets you write a C code routine which will check to see if the
> > feature should run, it returns false if the feature should not run or is
> > not required to run.
>
> So how is:
>
> static bool detect_foo(void) { return whatever; }
> statis void init_foo(void) { do something; }
>
> .detect = detect_foo,
> .early_init = init_foo,
>
> different from:
>
> static void init_foo(void)
> {
> if (whatever) do something;
> }
>
> .early_init = init_foo,

Its not, but the key here is not @detect but rather the pair of both @detect
and @depend, the @depend would refer to a @detect routine from another
"feature". We then just run time sort things to kick off in the right order --
this would help for cases the linker sort would not suffice (things which can
only be determined at run time). The run time sort was added for IOMMU
originally to handle a complex dependency chain which could only be determined
at run time. Instead of sprinkling checks all over we have the dependencies
clearly annotated with this simple structure and have a run time sort, sort the
init sequences as they need to happen. Its important to not only ensure order
but also to ensure things that *should not run* don't run.

Refer to the implementation in arch/x86/kernel/sort-init.c which is
heavily borrowed from the existing IOMMU run time sort we do to handle
the run time dependencies there. The reason we can't share sort between
x86 and IOMMU is the structs vary slightly, so the structs are of
different size and the semantics may end up being very different. Lastly
if a bug in sort / semantics in IOMMU we could end up regressing x86
init as well. We can't possibly assume we can share the same semantics
across the board for all users of linker tables. In this case its best to
just keep these linker tables separately and enable their own semantics to
be defined independently.

> > A cheesy way to implement a cheesy graph is to then use the same callback
> > as a pointer for dependency, so that if a feature depends on another, both
> > are on the same order level, and if they have a run time dependency
> > annotation, they can further annotate this relationship by having the
> > thing that depends, set its depends pointer to refer to the other
> > feature's detect callback.
> >
> > This is borrowed from the IOMMU init code in the kernel. Same logic.
> > Only thing is we also get link order support too. Link order semantics
> > and logic should suffice for most things, but not all as with the IOMMU
> > init stuff.
>
> But this doesn't work at all if .detect isn't set.

I've changed the semantics for x86. So for x86 this is optional, if
detect is null we ignore it. Likewise if depend is null, we ignore it
for sorting purposes.

> >> And what happens if the depend-implied order is
> >> inconsistent with order_level.
> >
> > Ah -- that's what I was trying to explain above in the comment
> > about inconsistent state -- a run time check *after* we do run
> > time sorting goes through and checks for sanity.
> >
> > All this is a light weight feature graph, if you will, other
> > folks are working on different solutions for this and while
> > I have considered sharing a solution for early init code this
> > seems a bit more ideal for now. Later in the future though we
> > should probably get together and talk about a proper feature
> > graph, with more consistent checks, etc. Its another reason
> > why I am also curious to see a SAT solver at run time in the
> > future, but this is super long term [0]...
> >
> > [0] http://kernelnewbies.org/KernelProjects/kconfig-sat
> >
> >> I would hope that order_level breaks ties in the topological sort and
> >> that there's a bit fat warning if the order_level ordering is
> >> inconsistent with the topological ordering. Preferably the warning
> >> would be at build time, in which case it could be an error.
> >
> > Order level will always be respected at link time. The depend
> > and detect thing are heuristics used at run time and since
> > it depends on run time state we can't do this at build time.
> > The big fat hairy nasty warning is there though. Curious
> > enough I ended up finding out that the sanity check for
> > IOMMU was intended to only be run at debug time but the
> > debug thing was *always* set to true, so we've been running
> > the sanity checker in the kernel for ages. Its why I also
> > feel a bit more comfortable in keeping it live and matching
> > more close to its implementation. I've gone to pains to
> > build off of the IOMMU init solution and simply extend
> > on the semantics there. Turns out there was much room for
> > enhancements. All the fine tuning are documented in the
> > userspace tree.
>
> So is this what you're saying:
>
> Callbacks are called in an order consistent with the dependencies
> specified by @depend.

No, @depend just helps with run time sorting the linker table so things that
need to go first which require run time semantics get tucked in first. This
should help with cruft checks, waiting on dependencies, so it should speed
up the process as well. For this x86 init stuff there will be one callback
per each major x86 init path, we'll start off with x86_init_fn_early_init()
which is the earliest we can get the subarch.

> Ties are broken first by @order_level and then
> by linker order.

Right.

> It is an error for an x86_init_fn with lower
> @order_level to depend on an x86_init_fn with higher @order_level.

Right. Another way to say this:

If p @depends on q, q's order level must be <= than p's as it should run first.

Note that technically if we didn't want this check we could remove it,
I tested that it would force the run time sort to then override the
order level, but that make semantics sloppy, so I've pegged this
as a requirement in definition and built in a checker for it. This is
in x86_init_fn_check().

> >> > + * @early_init: required, routine which will run in x86_64_start_reservations()
> >> > + * after we ensure boot_params.hdr.hardware_subarch is accessible and
> >> > + * properly set. Memory is not yet available. This the earliest we can
> >> > + * currently define a common shared callback since all callbacks need to
> >> > + * check for boot_params.hdr.hardware_subarch and this becomes accessible
> >> > + * on x86-64 until after load_idt().
> >>
> >> What's this for? Under what conditions is it called? What order?
> >> Why is it part of x86_init_fn at all?
> >
> > Its one of the reasons we are adding this, callbacks to for features.
> > Its called if the its requirements are met (subarch, and depends).
> > I'll work on extending this to be clearer.
>
> I think this should just be ".callback". Let the context by implied
> by the linker table in which it appears.

The thing is we want to be specific and allow one struct per main
x86 init path major call site, this lets you have one struct for
Kasan for instance, and you have semantics well spelled out.

> >
> >> > + */
> >> > +struct x86_init_fn {
> >> > + __u32 order_level;
> >> > + __u32 supp_hardware_subarch;
> >> > + bool (*detect)(void);
> >> > + bool (*depend)(void);
> >> > + void (*early_init)(void);
> >> > + __u32 flags;
> >> > +};
> >> > +
> >> > +/**
> >> > + * enum x86_init_fn_flags: flags for init sequences
> >> > + *
> >> > + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
> >> > + * has completed it can break out of the loop for init sequences on
> >> > + * its own level.
> >>
> >> What does this mean?
> >
> > It borrows the logic from the IOMMU stuff, it enables the core to detect
> > if *any* routine in the linker table has completed we can then now just
> > bail and stop going through the rest of the linker table.
>
> Can you give an example for which this would be used?

I'm going to rip this out.

I know this will be useful but since I can't see anything obvious right now
that simple link order won't address on early boot I'm going to rip this out
in the next series. If we find we may need it later we can consider it later.
Such a sort run time sort thing might prove more useful later in boot where
asynchronous behaviour is more prevalent. Folks working in higher layers
may find more use for it.

Luis