Re: [PATCH] EDAC, ghes: use CPER module handles to locate DIMMs

From: James Morse
Date: Thu Aug 30 2018 - 12:32:13 EST


Hi Fan,

On 30/08/18 15:40, wufan wrote:
>>> @@ -327,12 +349,20 @@ void ghes_edac_report_mem_error(int sev,
>> struct cper_sec_mem_err *mem_err)
>>> p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
>>> if (mem_err->validation_bits &
>> CPER_MEM_VALID_MODULE_HANDLE) {
>>> const char *bank = NULL, *device = NULL;
>>> + int index = -1;
>>> +
>>> dmi_memdev_name(mem_err->mem_dev_handle, &bank,
>> &device);
>>
>>> + p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>>> + mem_err->mem_dev_handle);
>>> if (bank != NULL && device != NULL)
>>> p += sprintf(p, "DIMM location:%s %s ", bank, device);
>>> - else
>>> - p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>>> - mem_err->mem_dev_handle);
>>
>> Why do we now print the handle every time? The handle is pretty
>> meaningless, it can only be used to find the location-strings, if we get those
>> we print them instead.
>
> For ghes_edac the bank/device is informational, and nothing would go wrong
> if the bank/device numbers are the same as another entry. But the handle
> is now critical for DIMM lookup, thus pull it out.

Is printing the handle to the kernel log critical?

I'd expect something collecting errors to read from sysfs, not dmesg. I thought
the whole point here was to update the per-dimm counters in sysfs.


Thanks,

James