Re: [PATCH 3/3] x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error()

From: Borislav Petkov
Date: Fri Jun 14 2024 - 18:44:53 EST


On Fri, Jun 14, 2024 at 05:47:36PM -0400, Yazen Ghannam wrote:
> I don't see why it won't work. If there is no break, then the iterator
> ends by setting the variable past the last valid value.
>
> For example, I ran this on a system with 512 CPUs:
>
> unsigned int cpu;
>
> /* Loops over CPUs 0-511. */
> for_each_possible_cpu(cpu)
> pr_info("loop: cpu=%d\n", cpu);
>
> /* CPU is now set to 512. */
> pr_info("final: cpu=%d\n", cpu);
>
> /* CPU 512 is not possible. */
> pr_info("CPU %d is %s possible\n", cpu, cpu_possible(cpu) ? "" : "not");
>
> But...I like your suggestion as it is much more explicit. And I might be
> missing something. :/

I can think of at least three:

* CPU topology and the initial_apicid sometimes can get programmed wrong by the
* FW. Nothing new.

* nr_cpus= - you can enable less CPUs than actually physically present so an MCE
on a CPU which is not enabled by Linux will be -EINVAL

* possible_cpus= - pretty much the same thing

But I haven't actually tried them - am just looking at the code.

And yes, with the apicid_found boolean it is perfectly clear what's going on.

And looking at

convert_apicid_to_cpu()

which already does that loop, we probably should talk to tglx whether we can
simply export that helper.

And better yet if he's done some more helpful caching of the reverse mapping:
apicid to CPU number. As part of the topology rewrite. Because then we don't
need the loop at all.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette