Re: [PATCH 3/8] x86, microcode, intel: clarify log messages

From: Borislav Petkov
Date: Wed Oct 29 2014 - 05:55:13 EST


On Tue, Oct 21, 2014 at 12:13:32PM -0200, Henrique de Moraes Holschuh wrote:
> > > @@ -178,11 +178,11 @@ static int apply_microcode_intel(int cpu)
> > > rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
> > >
> > > if (val[1] != mc_intel->hdr.rev) {
> > > - pr_err("CPU%d update to revision 0x%x failed\n",
> > > + pr_err("CPU%d: update to revision 0x%x rejected by the processor\n",
> > > cpu_num, mc_intel->hdr.rev);
> > > return -1;
> > > }
> > > - pr_info("CPU%d updated to revision 0x%x, date = %04x-%02x-%02x\n",
> > > + pr_info("CPU%d: entire core updated to revision 0x%x, date %04x-%02x-%02x\n",
> >
> > Those two above are not really needed IMO.
>
> They are an attempt to make it more clear. I've got bug reports about this
> sort of issue before in Debian, people do wonder why some "cpus" are
> updated, and others _apparently_ are not (because there are no messages for
> them).

The fact that people open bugs doesn't always mean it is really a bug :-)

And there's a very simple answer to that:

$ grep microcode /proc/cpuinfo | uniq

should give only a single line.

And I'm pretty sure people will ask about "entire core" too. You simply
can't explain to them that the microcode engine is shared between
threads in a oneliner. :-)

Hrrm, I still don't like that "entire core" thing, how about something
like this:

pr_info("CPU%d: update to revision 0x%x, date %04x-%02x-%02x\n"

this doesn't say at least that we're updating a CPU to a revision sth
sth but that we're doing the update on CPU%d. This should clearly
state what happens without making any assumptions about what the core
contains.

> We have more reasons why an update fail. Rejected by the processor is a
> very specific one we want to single out (it basically means we are either
> feeding crap to the processor _or_ violating one of the architectural
> requirements).

That all doesn't matter - you only want to be able to pinpoint which
error message you're looking at and as long as they differ, you're fine.
Otherwise this is just an unnecessary churn.

> It is lowercase because we now prefix all pr_<level>() with "microcode: ",
> so it will look like this:
>
> "microcode: warning: foo"
> "microcode: error: foo".
>
> Should I capitalize this and all other messages of this sort? It doesn't
> look right to me to log "microcode: Error: foo" or "microcode: Warning:
> foo".

Why not? You have the piece of machinery which generates the message,
i.e. "microcode" and then a sentence which starts with a capital letter.
I think this is the accepted practice in the kernel.

> The status-quo confuses people, they wonder why some "cpus" were skipped,
> and they file bugs about it or ask about it in the user forums. I got a bug
> report for a module-based AMD processor, and it took a few round-trips to
> explain what was happening to the user...
>
> If "entire core" is unclear, what wording would you suggest?

See above.

I just don't like the quantification "entire core" because later, when
CPU guys decide to split it more, we'll need to change it again.

> However, it isn't from the initrd: it comes from the early initramfs, which
> is a different kind of beast. We (distros) did have microcode data inside
> the initrd/initramfs for the regular microcode driver, so the previous error
> message could be misleading.

Wait a minute, we do supply the early microcode by adding it to the
initrd: Documentation/x86/early-microcode.txt

Also, what you do on Debian doesn't necessarily mean all distros do it,
does it?

> Would you perfer "warning: failed to save early initramfs microcode update
> data for future use\n" ?

I guess we can say something method-agnostic without explicitly naming
that method...


> > > @@ -72,13 +72,13 @@ int microcode_sanity_check(void *mc, int print_err)
> > > if ((ext_table_size < EXT_HEADER_SIZE)
> > > || ((ext_table_size - EXT_HEADER_SIZE) % EXT_SIGNATURE_SIZE)) {
> > > if (print_err)
> > > - pr_err("error! Small exttable size in microcode data file\n");
> > > + pr_err("error: small exttable size in microcode data file\n");
> >
> > That doesn't tell me a whole lot - maybe "... truncated exttable in microcode data file" ?
>
> If it hit this codepath, it will not be due to "normal" truncation. The

What is a "normal" truncation?

> caller detects truncation earlier, checking the main header total_size
> against the size of the data it got from userspace or the early initramfs.

You mean this:

if (data_size + MC_HEADER_SIZE > total_size)

?

If so, this is not truncation check - it sanity-checks what's in the
headers or the default sizes.


Btw, there's a typo in show_saved_mc()

pr_debug("mc_saved[%d]: sig=0x%x, pf=0x%x, rev=0x%x, toal size=0x%x, date = %04x-%02x-%02x\n",
^^^^
> The message really means: "exttable size incompatible with the size of a
> valid table".

Let's call this:

"Error: extended signature table size mismatch!\n"

because this is what this check does.

> I want to differentiate these two, they're caused by different issues, so
> they should not be the same error message.
>
> Would you prefer "exttable entry counter incompatible with exttable size",
> which is what this message really means?

Yep, that sounds better.

> Well, every other error message in microcode_sanity_check() also means we
> abort just the same. It doesn't make sense to have "abort" just in this
> one.

Ok.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/