Re: [PATCH 4/4] x86/cpuid: check for dependencies violations in CPUID and attempt to fix them
From: Dave Hansen
Date: Wed Jun 22 2022 - 13:19:05 EST
On 6/22/22 10:09, Maxim Levitsky wrote:
> On Wed, 2022-06-22 at 08:32 -0700, Dave Hansen wrote:
>> On 6/22/22 07:48, Maxim Levitsky wrote:
>>> Due to configuration bugs, sometimes a CPU feature is disabled in CPUID,
>>> but not features that depend on it.
>>>
>>> While the above is not supported, the kernel should try to not crash,
>>> and clearing the dependent cpu caps is the best way to do it.
>>
>> That's a rather paltry changelog.
>>
>> If I remember correctly, there's a crystal clear problem:
>>
>> If a CPU enumerates support for AVX2 but AVX via CPUID, the
>> kernel crashes.
>>
>> There's also a follow-on problem. The kernel has all the data it needs
>> to fix this, but just doesn't consult it:
>>
>> To make matters worse, the kernel _knows_ that this is an ill-
>> advised situation: The kernel prevents itself from clearing the
>> software representation of the AVX CPUID bit without also
>> clearing AVX2.
>>
>> But, the kernel only consults this knowledge when it is clearing
>> cpu_cap bits. It does not consult this information when it is
>> populating those cpu_cap bits.
>
> Yes, I agree. I'll update the changelog with something more in depth.
>
>>
>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>> index 4cc79971d2d847..c83a8f447d6aed 100644
>>> --- a/arch/x86/kernel/cpu/common.c
>>> +++ b/arch/x86/kernel/cpu/common.c
>>> @@ -1469,7 +1469,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>>> this_cpu->c_early_init(c);
>>>
>>> c->cpu_index = 0;
>>> - filter_cpuid_features(c, false);
>>> + filter_cpuid_features(c, true);
>>>
>>> if (this_cpu->c_bsp_init)
>>> this_cpu->c_bsp_init(c);
>>> @@ -1757,7 +1757,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>>> */
>>>
>>> /* Filter out anything that depends on CPUID levels we don't have */
>>> - filter_cpuid_features(c, true);
>>> + filter_cpuid_features(c, false);
>>>
>>> /* If the model name is still unset, do table lookup. */
>>> if (!c->x86_model_id[0]) {
>>
>> While we're at it, could we please rid ourselves of this unreadable
>> mystery true/false gunk?
>
> It is present if I understand the code correctly to avoid printing a warning twice.
> It used to be 'warn' parameter, and I changed it to 'early' parameter,
> inverting its boolean value, because I have seen that warning is not printed at all,
> and I assumed that it is because the first early call already clears the cpuid cap
> and the second call doesn't get the warning.
If the goal is truly to suppress the warning once per dependency, it
would be trivial to do:
struct cpuid_dep {
unsigned int feature;
unsigned int depends;
+ bool warned;
};
Then:
...
clear_feature(c, d->feature);
+ if (!d->warned)
+ pr_warn(...);
+ d->warned = 1;
You could even have two bits if you feel that we need separate warnings
for the hardware (true CPUID) and software (setup_clear_cpu_cap()) checks.
>>> pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, no CPUID level 0x%x\n",
>>> x86_cap_flag(df->feature), df->level);
>>> }
>>> +
>>> + for (d = cpuid_deps; d->feature; d++) {
>>> +
>>> + if (!test_feature(c, d->feature) || test_feature(c, d->depends))
>>> + continue;
>>> +
>>> + clear_feature(c, d->feature);
>>> +
>>> + pr_warn("CPU: CPU feature " X86_CAP_FMT " disabled, because it depends on "
>>> + X86_CAP_FMT " which is not supported in CPUID\n",
>>> + x86_cap_flag(d->feature), x86_cap_flag(d->depends));
>>> + }
>>> }
>>
>> The do_clear_cpu_cap() does this with a loop, presumably because a later
>> (higher index in the array) feature in cpuid_deps[] could theoretically
>> clear an earlier (lower index) feature.
>
> Sorry this is my silly mistake. I intended to call clear_cpu_cap here,
> which will if needed disable all the depedencies, so a loop doesn't
> seem to be needed here.
>
> It's not very efficient but this is only done once per vCPU so shouldn't matter.
Right, and it only loops if it is actually clearing features, which is
arguably the result of a broken CPU or hypervisor, which is rare too.
It's not inefficient in any case that matters.