Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
From: Chang S. Bae
Date: Wed Nov 06 2024 - 13:28:42 EST
On 11/4/2024 3:16 AM, Borislav Petkov wrote:
On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
+
+static bool need_staging(u64 *mmio_addrs, u64 pa)
+{
+ unsigned int i;
+
+ for (i = 0; mmio_addrs[i] != 0; i++) {
+ if (mmio_addrs[i] == pa)
+ return false;
+ }
+ mmio_addrs[i] = pa;
This is a weird function - its name is supposed to mean it queries something
but then it has side effects too.
I think I've carved out a bit more out of the loop and convoluted them
into a single function.
Instead, let the helper simply find the position in the array,
static unsigned int find_pos(u64 *mmio_addrs, u64 mmio_pa)
{
unsigned int i;
for (i = 0; mmio_addrs[i] != 0; i++) {
if (mmio_addrs[i] == mmio_pa)
break;
}
return i;
}
and update the array from there:
static void stage_microcode(void)
{
unsigned int pos;
...
for_each_cpu(cpu, cpu_online_mask) {
/* set 'mmio_pa' from RDMSR */
pos = find_pos(mmio_addrs, mmio_pa);
if (mmio_addrs[pos] == mmio_pa)
continue;
/* do staging */
mmio_addrs[pos] = mmio_pa;
}
...
}
Or, even the loop code can include it:
for_each_cpu(...) {
/* set 'mmio_pa' from RDMSR */
/* Find either the last position or a match */
for (i = 0; mmio_addrs[i] != 0 && mmio_addrs[i] !=
mmio_pa; i++);
if (mmio_addrs[i] == mmio_pa)
continue;
/* do staging */
mmio_addrs[i] = mmio_pa;
}
Maybe, if this unusual one line for-loop is not an issue, this can make
the code simple. Otherwise, at least the helper should remain doing one
thing.
>> + for_each_cpu(cpu, cpu_online_mask) {
Oh great, and someone went and offlined one of those CPUs right here. Fun.
Yes, offlining matters. Let me summarize a few things related to the
patchset for the record:
* The staging flow is tiggered by load_late_locked(), which already
holds cpus_read_lock() [1].
* When any SMT primary threads is offlined, setup_cpus() makes it
exit right away [2].
* When an offlined CPU is brought up online, it follows the early
loading path, which does not include staging.
[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/microcode/core.c#n705
[2]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/microcode/core.c#n658
+ mmio_pa = staging_addr(cpu);
+
+ if (need_staging(mmio_addrs, mmio_pa) &&
+ !staging_work(mmio_pa, ucode_patch_late, totalsize)) {
do_stage()
+ pr_err("Error: staging failed.\n");
... on CPU%d, err_val: 0x%x"\n"
perhaps?
For more info debugging something like that?
Maybe, either one of two ways:
The work function remains returning bool value, then it can print the
error message from there.
bool do_stage(...)
{
...
pr_err("Error: staging failed on CPU%d, with %s.\n",
smp_processor_id(), state == UCODE_TIMEOUT ?
"timeout" : "error");
return false
}
Then, this function may simply return on error.
static void stage_microcode(void)
{
...
if (!do_stage(...))
goto out;
...
out:
kfree(mmio_addrs);
}
Or, let the work function return a value like an enum ucode_state
indicating the status, and print the error message here.
static void stage_microcode(void)
{
enum ucode_state state;
...
state = do_stage(...);
if (state != UCODE_OK)
goto err_out;
err_out:
pr_err("Error: ...);
out:
...
}
+ goto out;
+ }
+ }
+
+ pr_info("Staging succeeded.\n");
"Staging of patch revision 0x%x succeeded.\n"...
more user-friendly.
Yes, that seems to be useful to put it on:
+ pr_info("Staging of patch revision 0x%x succeeded.\n",
+ ((struct microcode_header_intel *)ucode_patch_late)->rev);
If successful, users will see something like this:
[ 25.352716] microcode: Staging is available.
[ 25.357684] microcode: Current revision: 0x1234
[ 136.203269] microcode: Staging of patch revision 0xabcd succeeded.
[ 137.398491] microcode: load: updated on xx primary CPUs with yy siblings
[ 137.427386] microcode: revision: 0x1234 -> 0xabcd
Thanks,
Chang