changing format/size of data in TRACE_EVENT(extlog_mem_event)

From: Luck, Tony
Date: Wed Jun 24 2015 - 17:57:10 EST


In <ras/ras_event.h> we define a trace event for memory errors.
The last field is:

__field_struct(struct cper_mem_err_compact, data)

where the structure is defined in <linux/cper.h> as:

struct cper_mem_err_compact {
__u64 validation_bits;
__u16 node;
__u16 card;
__u16 module;
__u16 bank;
__u16 device;
__u16 row;
__u16 column;
__u16 bit_pos;
__u64 requestor_id;
__u64 responder_id;
__u64 target_id;
__u16 rank;
__u16 mem_array_handle;
__u16 mem_dev_handle;
};

This structure was defined based on the useful bits in the
UEFI 2.4 spec appendix N, section 2.5 "Memory Error Section".

But UEFI have released a new version of the spec ... 2.5

http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf

and things have been updated to cope with ever increasing memory sizes
thanks to Moore's law. The old structure got a couple of tweaks as a
quick band-aid to handle current problems (__u16 isn't big enough for
the "row" entry for some 64GB DIMMs, so they squeezed bits 16:17 into a
reserved field). But looking to the future they added a whole new GUID
record "Memory Error Section 2" that increases the width of the device,
row, column, rank and bit_pos fields from u16 to u32 and adds a couple
of completely new fields.

So the question is - how can we update the trace event to include these
new wider fields with the minimum pain to applications that look at it?
I don't know if there are any other consumers besides rasdaemon at the
moment ... but we don't want ugly transitions where you have to guess
which version of the application you need to run to work with a given
kernel version.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/