Re: [PATCH v2 2/8] efi: Decode IA32/X64 Processor Error Section

From: Borislav Petkov
Date: Tue Feb 27 2018 - 13:34:49 EST


On Tue, Feb 27, 2018 at 06:06:21PM +0000, Ghannam, Yazen wrote:
> It's not just about arches but record types. A single platform can report
> using arch-specific records, memory records, PCIe records, etc.
>
> So all the other record types only show fields with a valid bit set and this
> has always been the case. There may be users or tools who expect that
> same behavior.

You know we have this thing called tracepoints for that, don't you?

You define one tracepoint which regurgitates an error record and then
all arches which have the same table, share them.

> Please see the other patches where these are decoded further. As I
> mentioned the cover letter, the decoding is applied incrementally rather
> than having one large patch.

Here's what needs to go:

All the validation bits crap:

printk("%sValidation Bits: 0x%04x\n", pfx, validation_bits);

Instead, you simply dump all entries unconditionally and say whether
they're valid or not. This way you have the same format for each error
record - much easier to work with.

printk("%sCPUID Info:\n", pfx);
print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid,
sizeof(proc->cpuid), 0);

Frankly, I have no clue why we should even dump CPUID for *every* error.
It is completely sufficient to dump family/model/stepping like we do in
mce_amd.c. Besides, when debugging, you collect much more info from the
system - CPUID only is not enough.

printk("%sError Structure Type: %pUl\n", newpfx,
&err_info->err_type);

Unreadable GUIDs.

What are those even:

if (err_info->validation_bits & INFO_VALID_TARGET_ID) {
printk("%sTarget Identifier: 0x%016llx\n",
newpfx, err_info->target_id);
}

if (err_info->validation_bits & INFO_VALID_REQUESTOR_ID) {
printk("%sRequestor Identifier: 0x%016llx\n",
newpfx, err_info->requestor_id);
}

if (err_info->validation_bits & INFO_VALID_RESPONDER_ID) {
printk("%sResponder Identifier: 0x%016llx\n",
newpfx, err_info->responder_id);
}

some 8-byte ids? Who knows... Either remove them or make them
human-readable, *explaining* what the requestor is and the target is and
so on.

printk("%sRegister Array Size: 0x%04x\n", newpfx,
ctx_info->reg_arr_size);

I have no clue what that is for.

if (ctx_info->reg_ctx_type == CTX_TYPE_MSR) {
groupsize = 8; /* MSRs are 8 bytes wide. */
printk("%sMSR Address: 0x%08x\n", newpfx,
ctx_info->msr_addr);
}

What do I need the MSR *addresses* for?

if (ctx_info->reg_ctx_type == CTX_TYPE_MMREG) {
printk("%sMM Register Address: 0x%016llx\n", newpfx,
ctx_info->mm_reg_addr);
}

memory mapped registers?!?!

> Also, like I said in another thread, we should always print the raw value
> followed by the decoding. That way the raw info is there in case a user
> wants to send the data to the vendor or other expert party.

By that logic we shouldn't be decoding at all - we should be dumping a
fat hex blob.

Again, there's this thing called tracepoints which have exactly that,
you know. You can dump human readable info to dmesg and dump a whole raw
record into the tracepoint.

When the error is critical and we're about to die, the tracepoint
contents goes to dmesg too.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix ImendÃrffer, Jane Smithard, Graham Norton, HRB 21284 (AG NÃrnberg)
--