Re: [PATCH v5] x86/cpufeature: Warn about unmet feature dependencies

From: Ingo Molnar
Date: Fri Mar 07 2025 - 06:55:42 EST



* Sohil Mehta <sohil.mehta@xxxxxxxxx> wrote:

> Currently, the cpuid_deps[] table is only exercised when a particular
> feature gets explicitly disabled and clear_cpu_cap() is called. However,
> some of these listed dependencies might already be missing during boot.
>
> These types of errors shouldn't generally happen in production
> environments but they could sometimes sneak through, especially when VMs
> and Kconfigs are in the mix. Also, the kernel might introduce artificial
> dependencies between unrelated features such as making LAM depend on
> LASS.
>
> Unexpected failures can occur when the kernel tries to use such a
> feature. Add a simple boot time scan of the cpuid_deps[] table to detect
> the missing dependencies. One option is to disable all of such features
> during boot but that may cause regressions in existing systems. For now,
> just warn about the missing dependencies to create awareness.
>
> As a trade-off between spamming the kernel log and keeping track of all
> the features that have been warned about, only warn about the first
> missing dependency. Any subsequent unmet dependency will only be logged
> after the first one has been resolved.
>
> Features are typically represented through unsigned integers within the
> kernel though some of them have user friendly names if they are exposed
> via cpuinfo. Show the friendlier name if available otherwise display
> the X86_FEATURE_* numerals to make it easier to identify the feature.
>
> Suggested-by: Tony Luck <tony.luck@xxxxxxxxx>
> Suggested-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Signed-off-by: Sohil Mehta <sohil.mehta@xxxxxxxxx>
> ---
> v5: Don't disable features with unmet dependencies (Ingo)
> Use pr_warn_once() for the warning (Ingo)
>
> v4: https://lore.kernel.org/lkml/20241210224037.3052555-1-sohil.mehta@xxxxxxxxx/
> Update the log level to pr_info() (Ingo)
> Update the char buffer size to 16 (0day warning)
> Dropped Dave's review tag since the code has changed a bit more
> than usual.
>
> v3: Picked up the review tag.
> Reworded the commit message.
>
> v2: Use cpu_has() instead of boot_cpu_has() (Sean)
> ---
> arch/x86/include/asm/cpufeature.h | 1 +
> arch/x86/kernel/cpu/common.c | 4 ++++
> arch/x86/kernel/cpu/cpuid-deps.c | 35 +++++++++++++++++++++++++++++++
> 3 files changed, 40 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index de1ad09fe8d7..1bb30e82ea77 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -149,6 +149,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
>
> extern void setup_clear_cpu_cap(unsigned int bit);
> extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
> +void scan_feature_dependencies(struct cpuinfo_x86 *c);
>
> #define setup_force_cpu_cap(bit) do { \
> \
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 7cce91b19fb2..52111d97c8ae 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1610,6 +1610,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>
> c->cpu_index = 0;
> filter_cpuid_features(c, false);
> + scan_feature_dependencies(c);
>
> if (this_cpu->c_bsp_init)
> this_cpu->c_bsp_init(c);
> @@ -1870,6 +1871,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
> /* Filter out anything that depends on CPUID levels we don't have */
> filter_cpuid_features(c, true);
>
> + /* Scan for unmet dependencies based on the CPUID dependency table */
> + scan_feature_dependencies(c);

s/scane_feature_dependencies
/x86_check_cpufeature_deps

or so.

> +/*
> + * Return the feature "name" if available otherwise return
> + * the X86_FEATURE_* numerals to make it easier to identify
> + * the feature.

s/available otherwise
/available, otherwise

> + */
> +static const char *x86_feature_name(unsigned int feature, char *buf)
> +{
> + if (x86_cap_flags[feature])
> + return x86_cap_flags[feature];
> +
> + snprintf(buf, 16, "%d*32+%2d", feature / 32, feature % 32);
> +
> + return buf;
> +}
> +
> +void scan_feature_dependencies(struct cpuinfo_x86 *c)
> +{
> + char feature_buf[16], depends_buf[16];
> + const struct cpuid_dep *d;
> +
> + for (d = cpuid_deps; d->feature; d++) {
> + if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
> + /*
> + * Only warn about the first unmet dependency on the
> + * first CPU where it is encountered to avoid spamming
> + * the kernel log.
> + */
> + pr_warn_once("CPU%d: feature:%s may not work properly without feature:%s\n",
> + smp_processor_id(),
> + x86_feature_name(d->feature, feature_buf),
> + x86_feature_name(d->depends, depends_buf));

I'd make this a bit less passive-aggressive, something like:

x86 CPU feature dependency check failure: CPU%d has '%s' enabled but '%s' disabled. Kernel might be fine, but no guarantees.

Because initially most of these warnings will be about quirks and
oddities that happen to work fine in the current code.

Thanks,

Ingo