Re: [PATCH 04/16] arm64: capabilities: Prepare for fine grained capabilities

From: Suzuki K Poulose
Date: Thu Jan 25 2018 - 12:09:08 EST


On 25/01/18 13:43, Dave Martin wrote:
On Wed, Jan 24, 2018 at 06:45:01PM +0000, Suzuki K Poulose wrote:
Dave,

Thanks a lot for taking the time to review the patch and for
opening up the discussion ! See my comments below.

On 23/01/18 17:33, Dave Martin wrote:
On Tue, Jan 23, 2018 at 12:27:57PM +0000, Suzuki K Poulose wrote:
We use arm64_cpu_capabilities to represent CPU ELF HWCAPs exposed
to the userspace and the CPU hwcaps used by the kernel, which
include cpu features and CPU errata work arounds.

At the moment we have the following restricions:

a) CPU feature hwcaps (arm64_features) and ELF HWCAPs (arm64_elf_hwcap)
- Detected mostly on system wide CPU feature register. But
there are some which really uses a local CPU's value to
decide the availability (e.g, availability of hardware
prefetch). So, we run the check only once, after all the
boot-time active CPUs are turned on.
- Any late CPU which doesn't posses all the established features
is killed.
- Any late CPU which possess a feature *not* already available
is allowed to boot.

b) CPU Errata work arounds (arm64_errata)
- Detected mostly based on a local CPU's feature register.
The checks are run on each boot time activated CPUs.
- Any late CPU which doesn't have any of the established errata
work around capabilities is ignored and is allowed to boot.
- Any late CPU which has an errata work around not already available
is killed.

However there are some exceptions to the cases above.

1) KPTI is a feature that we need to enable when at least one CPU needs it.
And any late CPU that has this "feature" should be killed.
2) Hardware DBM feature is a non-conflicting capability which could be
enabled on CPUs which has it without any issues, even if the CPU is
brought up late.

So this calls for a lot more fine grained behavior for each capability.
And if we define all the attributes to control their behavior properly,
we may be able to use a single table for the CPU hwcaps (not the
ELF HWCAPs, which cover errata and features). This is a prepartory step
to get there. We define type for a capability, which for now encodes the
scope of the check. i.e, whether it should be checked system wide or on
each local CPU. We define two types :

1) ARM64_CPUCAP_BOOT_SYSTEM_FEATURE - Implies (a) as described above.
1) ARM64_CPUCAP_STRICT_CPU_LOCAL_ERRATUM - Implies (b) as described above.

I'm a bit confused by the naming here.


I understand they are not well documented. I will try to explain here and
stick it in somewhere, in the next revision.

There are 3 aspects for any capability.

1) SCOPE: How is the capability detected ?
i.e, whether it should be detected based on at least one CPU's ID register
(e.g, MIDR) (SCOPE_CPU_LOCAL) or System wide value (like ELF HWCAP)(SCOPE_SYSTEM)

This is fairly straight forward and hasn't changed much. The only
change we need is to allow for CPU "Features" that could be detected on
a local CPU. So, "LOCAL" tag has been added to the capabilities which
are CPU local.

I think my confusion here is mostly semantic, so I won't worry too
much about it for now as I review the patches.

<aside>

It is confusing code to dive into though: there is too much overlap
and ambiguity between the intuitive meanings of "feature", "capability",
"erratum" etc. I think this is just the way the code evolved:
"capability" is an intuitive word to describe that the system can do
something, but it's a strange word to use for the presence of an
erratum that must be worked around.

The immutable properties of CPUs, and the desicion taken by Linux about
what to do with them, are two quite different things. I find it easy to
confuse myself about which is being referred to when we talk about
"capabilities", partly because the intuitive meaning of this word often
seems to point to the wrong answer.

Choosing different words might help, but precise definitions and
consistent would also help.

Here, "detecting" sounds like discovery of an immutable characteristic
of something, but really in the System-wide case you're talking about
making a policy decision (i.e., what tradeoff to make between working
with the boot and/or early CPUs, and working with late CPUs that we
haven't seen yet). It would be equally valid to make a different
decision and reject some _early_ CPUs in the hopes that this will
result in a better performing system (perhaps because we can now
enable a desirable feature globally, or perhaps to minimise likelihood
of rejecting late CPUs).

</aside>

Anyway, this isn't a high priority. It's just something that takes a
little time to get my head around...

2) WHEN: When is the capability "enabled" or in other words, when does the
kernel start using the capability. At the moment there is only
one place. After all the CPUs are brought up during the kernel init, from
setup_cpu_features. (But this will change when we add support for enabling
some capabilities very early during the boot, based on the boot CPU. e.g, VHE
and GIC CPUIF - for NMI).

For now, unless you see "EARLY", in the type, everything is enabled from
setup_cpu_features.

Agreed -- I'd forgotten about this. To what extent is this orthogonal
to 1/3/4?

Apologies for not posting the work here, it was delayed due to some
issues with how we handle VHE. I will include it in the next version.

The Early capabilities are based on Boot CPU, and is "detected" only on boot
CPU. So, scope implies SCOPE_CPU_LOCAL. As for (2), the capability is enabled
on the boot CPU right after it is detected, to allow early usage of the feature.
For any other CPU (including the boot time activated ones) we make sure that
they are verified against "early" capabilities.

To make the rule a bit generic, any CPU is verified against the already
"enabled" capabilities. There are two stages during the boot when the capabilities
are enabled.
1) Early caps: Detected and Enabled on the boot CPU, before any other CPU is
brought up.
2) All the other caps: Could be detected on CPU / System wide. But enabled
only after all the boot time active CPUs are up.

So, in any context, "LATE" tag now implies "after the particular capability"
is enabled by the kernel.

---------------------------------------------------------------
Verification: | Boot CPU | SMP CPUs by kernel | CPUs by user |
---------------------------------------------------------------
Early cap | n | y | y |
---------------------------------------------------------------
All other cap | n | n | y |
---------------------------------------------------------------

The above table kind of shows how the capabilities are checked for conflicts.
Any early capability is treated as critical and a conflict will result in a
Panic, unless the non-compatible CPU is parked safely. (e.g, for VHE we park
the late CPU with MMU off).

3) WHERE: Where all do we apply the "enable" for a given capability, when it
is to be enabled. Do we run it on all CPUs or only on the CPUs with the capability.
We don't deal with this in the infrastructure and we run the enable callback
on all CPUs irrespective of when the CPU is booted. It is upto the callback
to decide if it needs to take some action on a given CPU.


4) CONFLICT: How should we treat a conflict of a capability on a CPU which
boots after the capability was enabled in the kernel (as per 2, it could
be earlier or after smp CPUs are brought up) ?

Here are the possible combinations, representing a Conflict:

System | Late Booting CPU
a) y | n
b) n | y

(a) Is considered safe for features like Erratum and KPTI where some
actions have to be taken to make the system behave in desirable manner.
However (a) is not considered safe for some features we advertise to the
user (e.g, ELF HWCAP)

(b) Is considered safe for most of the features (except KPTI and Software
prefetching), while it is unsafe for Errata, as we couldn't possibly take
any action as the time for applying the actions (enabling as per (2) above)
has passed already.


So we have the following cases :
i) a_Safe && b_not_Safe
ii) a_not_Safe && b_Safe
iii) a_Safe && b_Safe

Is this the same as saying that the CONFLICT type (iii) is not
applicable to capabilites that are not decided globally?

Practically, yes. But we don't want to label it as such, as we don't know
what future features could come with.


I have used "STRICT" tag for category (i) and "WEAK" for (iii). (ii) doesn't
have any tag. I acknowledge that this scheme is not particularly intuitive.
However I have tried to explain the behavior of "each" defined type in a
comment above.


I'm not keen on the word "strict" because it doesn't really mean
anything unless the rule to be strictly enforced is documented, and
then doesn't really add anything: you could take the view that either
there's an (enforced) rule or there's no rule at all.

As I mentioned above, each type is documented on the rules. But the tags
are not particularly helpful.


In this case, (b) seems the "relaxed" case anyway: we allow these
capabilities to be enabled/disabled differently on different CPUs,
rather than requiring the same configuration on all CPUs (KPTI).

I think there are 2 x 2 orthogonal options:

A FEATURE capability must be disabled on every CPU where the feature is
not present. A feature capability is considered desirable, and should
be enabled when possible.

An ERRATUM capability must be enabled on every CPU where the erratum is
present. An erratum capability is considered undesirable, and should
be disabled when possible.

(An erratum is the inverse of a feature with respect to desirability,
enablement and presence.)

I would really like to avoid tagging the capabilities FEATURE/ERRATUM
to indicate their behavior, because :

1) We could be naming a "feature" ERRATUM, where it is really not one.
e.g, KPTI is a security feature and not an erratum.

2) We could be "marking" all "CPU Errata" unsafe when detected late.
e.g, Cortex-A55 errata 1024718, can be safely handled by disabling the
DBM feature and the conflict can be "ignored" (even though it is not
handled via capability).

That's a fair point. "Erratum" is describing a set of constraints
between the CPUs and kernel that is typical of errata handing, but could
also apply to other features. So I guess it's not the best choice of
word.

So using the FEATURE/ERRATA will cause further confusion on the real
type of the capability. Hence, it would be better if we came up with
something other to make it clear.

I guess it may not be quite as clear-cut as I'm hoping...

and:

A GLOBAL capability must be enabled on all CPUs or disabled on all CPUs.
(Thus, a decision must be taken during boot, based on desirability/
undesirability of each capability and based on the assumption that
late CPUs won't violate the decision -- unless this assumption is
overridden by a cmdline arg etc.)

A LOCAL capability may freely be enabled on some CPUs while disabled on
other CPUs.

As discussed in (3) above, I would like to leave it to the "enable()" method
to decide where actions are need, to avoid adding more combinations of tags
to the type names. We continue to invoke enable() on all CPUs.

Sure -- if the decisions are made enable() methods rather than blanket
rules, then we don't need a distinct capability type for every possible
case. For the very common cases like the ELF HWCAPs it may be worth it,
but not for fiddly one-off things.

The types are just a name for collective flags and sometimes there may be
same bits in different names. This is for making it easier to add new
entries to the table with the right bits set.


(Thus, LOCAL is precisely the inverse of GLOBAL.)


So:

ARM64_CPUCAP_GLOBAL_FEATURE (hwcap case)
ARM64_CPUCAP_LOCAL_FEATURE (DBM case)
ARM64_CPUCAP_GLOBAL_ERRATUM (KPTI case)
ARM64_CPUCAP_LOCAL_ERRATUM (regular CPU erratum case).

("PERCPU" could be substituted for "LOCAL" if we want to be clearer
about the way in which LOCAL is more permissive.)

In general, I understand the names could get some improved scheme.
I think if we come up with some nice tag for types i, ii & ii in section (4)
we should be good to go.

This is always something that could be reexamined later.

I think I understand the basic ideas here a bit better now -- thanks.

[...]


Thanks
Suzuki