Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging

From: Chang S. Bae
Date: Wed Nov 06 2024 - 13:28:42 EST


On 11/4/2024 3:16 AM, Borislav Petkov wrote:
On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
+
+static bool need_staging(u64 *mmio_addrs, u64 pa)
+{
+ unsigned int i;
+
+ for (i = 0; mmio_addrs[i] != 0; i++) {
+ if (mmio_addrs[i] == pa)
+ return false;
+ }
+ mmio_addrs[i] = pa;

This is a weird function - its name is supposed to mean it queries something
but then it has side effects too.

I think I've carved out a bit more out of the loop and convoluted them
into a single function.

Instead, let the helper simply find the position in the array,

static unsigned int find_pos(u64 *mmio_addrs, u64 mmio_pa)
{
unsigned int i;

for (i = 0; mmio_addrs[i] != 0; i++) {
if (mmio_addrs[i] == mmio_pa)
break;
}
return i;
}

and update the array from there:

static void stage_microcode(void)
{
unsigned int pos;
...
for_each_cpu(cpu, cpu_online_mask) {
/* set 'mmio_pa' from RDMSR */

pos = find_pos(mmio_addrs, mmio_pa);
if (mmio_addrs[pos] == mmio_pa)
continue;

/* do staging */

mmio_addrs[pos] = mmio_pa;
}
...
}

Or, even the loop code can include it:

for_each_cpu(...) {
/* set 'mmio_pa' from RDMSR */

/* Find either the last position or a match */
for (i = 0; mmio_addrs[i] != 0 && mmio_addrs[i] != mmio_pa; i++);

if (mmio_addrs[i] == mmio_pa)
continue;

/* do staging */

mmio_addrs[i] = mmio_pa;
}

Maybe, if this unusual one line for-loop is not an issue, this can make
the code simple. Otherwise, at least the helper should remain doing one
thing.

>> + for_each_cpu(cpu, cpu_online_mask) {

Oh great, and someone went and offlined one of those CPUs right here. Fun.

Yes, offlining matters. Let me summarize a few things related to the
patchset for the record:

* The staging flow is tiggered by load_late_locked(), which already
holds cpus_read_lock() [1].

* When any SMT primary threads is offlined, setup_cpus() makes it
exit right away [2].

* When an offlined CPU is brought up online, it follows the early
loading path, which does not include staging.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/microcode/core.c#n705
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/microcode/core.c#n658


+ mmio_pa = staging_addr(cpu);
+
+ if (need_staging(mmio_addrs, mmio_pa) &&
+ !staging_work(mmio_pa, ucode_patch_late, totalsize)) {

do_stage()

+ pr_err("Error: staging failed.\n");

... on CPU%d, err_val: 0x%x"\n"

perhaps?

For more info debugging something like that?

Maybe, either one of two ways:

The work function remains returning bool value, then it can print the
error message from there.

bool do_stage(...)
{
...
pr_err("Error: staging failed on CPU%d, with %s.\n",
smp_processor_id(), state == UCODE_TIMEOUT ? "timeout" : "error");
return false
}

Then, this function may simply return on error.

static void stage_microcode(void)
{
...
if (!do_stage(...))
goto out;
...
out:
kfree(mmio_addrs);
}

Or, let the work function return a value like an enum ucode_state
indicating the status, and print the error message here.

static void stage_microcode(void)
{
enum ucode_state state;

...
state = do_stage(...);
if (state != UCODE_OK)
goto err_out;

err_out:
pr_err("Error: ...);
out:
...
}


+ goto out;
+ }
+ }
+
+ pr_info("Staging succeeded.\n");

"Staging of patch revision 0x%x succeeded.\n"...

more user-friendly.

Yes, that seems to be useful to put it on:

+ pr_info("Staging of patch revision 0x%x succeeded.\n",
+ ((struct microcode_header_intel *)ucode_patch_late)->rev);

If successful, users will see something like this:

[ 25.352716] microcode: Staging is available.
[ 25.357684] microcode: Current revision: 0x1234
[ 136.203269] microcode: Staging of patch revision 0xabcd succeeded.
[ 137.398491] microcode: load: updated on xx primary CPUs with yy siblings
[ 137.427386] microcode: revision: 0x1234 -> 0xabcd

Thanks,
Chang