RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL spec rev 3.1

From: Shiju Jose
Date: Wed Nov 27 2024 - 13:20:42 EST


Hi Steve,

Thanks for the quick reply.
Please find reply inline.

>-----Original Message-----
>From: Steven Rostedt <rostedt@xxxxxxxxxxx>
>Sent: 27 November 2024 15:42
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: dave.jiang@xxxxxxxxx; dan.j.williams@xxxxxxxxx; Jonathan Cameron
><jonathan.cameron@xxxxxxxxxx>; alison.schofield@xxxxxxxxx;
>nifan.cxl@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>dave@xxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>Linuxarm <linuxarm@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>;
>Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>
>Subject: Re: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
>CXL spec rev 3.1
>
>On Wed, 27 Nov 2024 10:12:12 +0000
>Shiju Jose <shiju.jose@xxxxxxxxxx> wrote:
>
>> format:
>> field:unsigned short common_type; offset:0; size:2;
> signed:0;
>> field:unsigned char common_flags; offset:2; size:1;
> signed:0;
>> field:unsigned char common_preempt_count; offset:3; size:1;
> signed:0;
>> field:int common_pid; offset:4; size:4; signed:1;
>>
>> field:__data_loc char[] memdev; offset:8; size:4;
> signed:0;
>> field:__data_loc char[] host; offset:12; size:4; signed:0;
>> field:int log; offset:16; size:4; signed:1;
>
>> field:uuid_t hdr_uuid; offset:20; size:16; signed:0;
>
>New type for me ;-)
>
>> field:u64 serial; offset:40; size:8; signed:0;
>> field:u32 hdr_flags; offset:48; size:4; signed:0;
>> field:u16 hdr_handle; offset:52; size:2; signed:0;
>> field:u16 hdr_related_handle; offset:54; size:2; signed:0;
>> field:u64 hdr_timestamp; offset:56; size:8; signed:0;
>> field:u8 hdr_length; offset:64; size:1; signed:0;
>> field:u8 hdr_maint_op_class; offset:65; size:1; signed:0;
>> field:u8 hdr_maint_op_sub_class; offset:66; size:1;
> signed:0;
>> field:u64 dpa; offset:72; size:8; signed:0;
>> field:u8 descriptor; offset:80; size:1; signed:0;
>> field:u8 type; offset:81; size:1; signed:0;
>> field:u8 transaction_type; offset:82; size:1; signed:0;
>> field:u8 channel; offset:83; size:1; signed:0;
>> field:u32 device; offset:84; size:4; signed:0;
>> field:u8 comp_id[16]; offset:88; size:16; signed:0;
>> field:u64 hpa; offset:104; size:8; signed:0;
>> field:uuid_t region_uuid; offset:112; size:16; signed:0;
>> field:u16 validity_flags; offset:128; size:2; signed:0;
>> field:u8 rank; offset:130; size:1; signed:0;
>> field:u8 dpa_flags; offset:131; size:1; signed:0;
>> field:__data_loc char[] region_name; offset:132; size:4;
> signed:0;
>> field:u8 sub_type; offset:136; size:1; signed:0;
>> field:u8 cme_threshold_ev_flags; offset:137; size:1;
> signed:0;
>> field:u32 cme_count; offset:140; size:4; signed:0;
>>
>> print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu uuid=%pUb
>len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u
>maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s'
>transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s'
>comp_id=%shpa=%llx region=%s region_uuid=%pUb sub_type=%u
>cme_threshold_ev_flags=%u cme_count=%u", __get_str(memdev),
>__get_str(host), REC->serial, __print_symbolic(REC->log, {
>CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN,
>"Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL,
>"Fatal" }), REC->hdr_timestamp,
>
>
>> &REC->hdr_uuid,
>
>libtraceevent doesn't know about taking an address with '&'.
>
>If I remove it (and the other one below for region_uuid), it parses fine.
>
>I'll have to add this to the library, as it should be able to handle this.
>I bet I also have to add "%pUb" as well.
>
I tested removing hdr_uuid and region_uuid from the rasdaemon test setup
as you suggested. As a result, libtraceevent parses correctly, as you mentioned.

However, I encounter similar parsing error ("FAILED TO PARSE") when I add two additional
decoded strings (%s) to the TP_printk, replacing (%u). Please see the attached format file,
"format_cxl_general_media_v3.1_basic", for your reference.

I've also attached another format file, "format_cxl_general_media_v3.1_full",
which contains the complete TP_printk() intended.

Can you please help or else can you share how to debug these errors in the
libtraceevent setup?

>Thanks,
>
>-- Steve
>
Thanks,
Shiju
root@localhost:~# cat /sys/kernel/debug/tracing/events/cxl/cxl_general_media/format
name: cxl_general_media
ID: 1464
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:__data_loc char[] memdev; offset:8; size:4; signed:0;
field:__data_loc char[] host; offset:12; size:4; signed:0;
field:int log; offset:16; size:4; signed:1;
field:u64 serial; offset:24; size:8; signed:0;
field:u32 hdr_flags; offset:32; size:4; signed:0;
field:u16 hdr_handle; offset:36; size:2; signed:0;
field:u16 hdr_related_handle; offset:38; size:2; signed:0;
field:u64 hdr_timestamp; offset:40; size:8; signed:0;
field:u8 hdr_length; offset:48; size:1; signed:0;
field:u8 hdr_maint_op_class; offset:49; size:1; signed:0;
field:u8 hdr_maint_op_sub_class; offset:50; size:1; signed:0;
field:u64 dpa; offset:56; size:8; signed:0;
field:u8 descriptor; offset:64; size:1; signed:0;
field:u8 type; offset:65; size:1; signed:0;
field:u8 sub_type; offset:66; size:1; signed:0;
field:u8 transaction_type; offset:67; size:1; signed:0;
field:u8 channel; offset:68; size:1; signed:0;
field:u32 device; offset:72; size:4; signed:0;
field:u8 comp_id[16]; offset:76; size:16; signed:0;
field:u64 hpa; offset:96; size:8; signed:0;
field:u16 validity_flags; offset:104; size:2; signed:0;
field:u8 rank; offset:106; size:1; signed:0;
field:u8 dpa_flags; offset:107; size:1; signed:0;
field:u8 cme_threshold_ev_flags; offset:108; size:1; signed:0;
field:u32 cme_count; offset:112; size:4; signed:0;
field:__data_loc char[] region_name; offset:116; size:4; signed:0;

print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' sub_type='%s' transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s' comp_id=%s comp_id_pldm_valid_flags='%s' pldm_entity_id=%s pldm_resource_id=%s hpa=%llx cme_threshold_ev_flags='%s' cme_count=%x region=%s", __get_str(memdev), __get_str(host), REC->serial, __print_symbolic(REC->log, { CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN, "Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL, "Fatal" }), REC->hdr_timestamp, REC->hdr_length, __print_flags(REC->hdr_flags, " | ", { ((((1UL))) << (2)), "PERMANENT_CONDITION" }, { ((((1UL))) << (3)), "MAINTENANCE_NEEDED" }, { ((((1UL))) << (4)), "PERFORMANCE_DEGRADED" }, { ((((1UL))) << (5)), "HARDWARE_REPLACEMENT_NEEDED" }, { ((((1UL))) << (6)), "MAINT_OP_SUB_CLASS_VALID" } ), REC->hdr_handle, REC->hdr_related_handle, REC->hdr_maint_op_class, REC->hdr_maint_op_sub_class, REC->dpa, __print_flags(REC->dpa_flags, "|", { ((((1UL))) << (0)), "VOLATILE" }, { ((((1UL))) << (1)), "NOT_REPAIRABLE" } ), __print_flags(REC->descriptor, "|", { ((((1UL))) << (0)), "UNCORRECTABLE_EVENT" }, { ((((1UL))) << (1)), "THRESHOLD_EVENT" }, { ((((1UL))) << (2)), "POISON_LIST_OVERFLOW" } ), __print_symbolic(REC->type, { 0x00, "ECC Error" }, { 0x01, "Invalid Address" }, { 0x02, "Data Path Error" }, { 0x03, "TE State Violation" }, { 0x04, "Scrub Media ECC Error" }, { 0x05, "Adv Prog CME Counter Expiration" }, { 0x06, "CKID Violation" } ), __print_symbolic(REC->sub_type, { 0x00, "Not Reported" }, { 0x01, "Internal Datapath Error" }, { 0x02, "Media Link Command Training Error" }, { 0x03, "Media Link Control Training Error" }, { 0x04, "Media Link Data Training Error" }, { 0x05, "Media Link CRC Error" } ), __print_symbolic(REC->transaction_type, { 0x00, "Unknown" }, { 0x01, "Host Read" }, { 0x02, "Host Write" }, { 0x03, "Host Scan Media" }, { 0x04, "Host Inject Poison" }, { 0x05, "Internal Media Scrub" }, { 0x06, "Internal Media Management" }, { 0x07, "Internal Media Error Check Scrub" }, { 0x08, "Media Initialization" } ), REC->channel, REC->rank, REC->device, __print_flags(REC->validity_flags, "|", { ((((1UL))) << (0)), "CHANNEL" }, { ((((1UL))) << (1)), "RANK" }, { ((((1UL))) << (2)), "DEVICE" }, { ((((1UL))) << (3)), "COMPONENT" }, { ((((1UL))) << (4)), "COMPONENT PLDM FORMAT" } ), __print_hex(REC->comp_id, 0x10), __print_flags(REC->comp_id[0], " | ", { ((((1UL))) << (0)), "PLDM Entity ID" }, { ((((1UL))) << (1)), "Resource ID" } ), (REC->validity_flags & ((((1UL))) << (3)) && REC->validity_flags & ((((1UL))) << (4))) ? (REC->comp_id[0] & ((((1UL))) << (0))) ? __print_hex(&REC->comp_id[1], 6) : "0x00" : "0x00", (REC->validity_flags & ((((1UL))) << (3)) && REC->validity_flags & ((((1UL))) << (4))) ? (REC->comp_id[0] & ((((1UL))) << (1))) ? __print_hex(&REC->comp_id[7], 4) : "0x00" : "0x00", REC->hpa, __print_flags(REC->cme_threshold_ev_flags, "|", { ((((1UL))) << (0)), "Corrected Memory Errors in Multiple Media Components" }, { ((((1UL))) << (1)), "Exceeded Programmable Threshold" } ), REC->cme_count, __get_str(region_name)

root@localhost:~# cat /sys/kernel/debug/tracing/events/cxl/cxl_general_media/format
name: cxl_general_media
ID: 1464
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:__data_loc char[] memdev; offset:8; size:4; signed:0;
field:__data_loc char[] host; offset:12; size:4; signed:0;
field:int log; offset:16; size:4; signed:1;
field:u64 serial; offset:24; size:8; signed:0;
field:u32 hdr_flags; offset:32; size:4; signed:0;
field:u16 hdr_handle; offset:36; size:2; signed:0;
field:u16 hdr_related_handle; offset:38; size:2; signed:0;
field:u64 hdr_timestamp; offset:40; size:8; signed:0;
field:u8 hdr_length; offset:48; size:1; signed:0;
field:u8 hdr_maint_op_class; offset:49; size:1; signed:0;
field:u8 hdr_maint_op_sub_class; offset:50; size:1; signed:0;
field:u64 dpa; offset:56; size:8; signed:0;
field:u8 descriptor; offset:64; size:1; signed:0;
field:u8 type; offset:65; size:1; signed:0;
field:u8 transaction_type; offset:66; size:1; signed:0;
field:u8 channel; offset:67; size:1; signed:0;
field:u32 device; offset:68; size:4; signed:0;
field:u8 comp_id[16]; offset:72; size:16; signed:0;
field:u64 hpa; offset:88; size:8; signed:0;
field:u16 validity_flags; offset:96; size:2; signed:0;
field:u8 rank; offset:98; size:1; signed:0;
field:u8 dpa_flags; offset:99; size:1; signed:0;
field:__data_loc char[] region_name; offset:100; size:4; signed:0;
field:u8 sub_type; offset:104; size:1; signed:0;
field:u8 cme_threshold_ev_flags; offset:105; size:1; signed:0;
field:u32 cme_count; offset:108; size:4; signed:0;

print fmt: "memdev=%s host=%s serial=%lld log=%s : time=%llu len=%d flags='%s' handle=%x related_handle=%x maint_op_class=%u maint_op_sub_class=%u : dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u device=%x validity_flags='%s' comp_id=%s hpa=%llx region=%s sub_type='%s' cme_threshold_ev_flags='%s' cme_count=%u", __get_str(memdev), __get_str(host), REC->serial, __print_symbolic(REC->log, { CXL_EVENT_TYPE_INFO, "Informational" }, { CXL_EVENT_TYPE_WARN, "Warning" }, { CXL_EVENT_TYPE_FAIL, "Failure" }, { CXL_EVENT_TYPE_FATAL, "Fatal" }), REC->hdr_timestamp, REC->hdr_length, __print_flags(REC->hdr_flags, " | ", { ((((1UL))) << (2)), "PERMANENT_CONDITION" }, { ((((1UL))) << (3)), "MAINTENANCE_NEEDED" }, { ((((1UL))) << (4)), "PERFORMANCE_DEGRADED" }, { ((((1UL))) << (5)), "HARDWARE_REPLACEMENT_NEEDED" }, { ((((1UL))) << (6)), "MAINT_OP_SUB_CLASS_VALID" } ), REC->hdr_handle, REC->hdr_related_handle, REC->hdr_maint_op_class, REC->hdr_maint_op_sub_class, REC->dpa, __print_flags(REC->dpa_flags, "|", { ((((1UL))) << (0)), "VOLATILE" }, { ((((1UL))) << (1)), "NOT_REPAIRABLE" } ), __print_flags(REC->descriptor, "|", { ((((1UL))) << (0)), "UNCORRECTABLE_EVENT" }, { ((((1UL))) << (1)), "THRESHOLD_EVENT" }, { ((((1UL))) << (2)), "POISON_LIST_OVERFLOW" } ), __print_symbolic(REC->type, { 0x00, "ECC Error" }, { 0x01, "Invalid Address" }, { 0x02, "Data Path Error" }, { 0x03, "TE State Violation" }, { 0x04, "Scrub Media ECC Error" }, { 0x05, "Adv Prog CME Counter Expiration" }, { 0x06, "CKID Violation" } ), __print_symbolic(REC->transaction_type, { 0x00, "Unknown" }, { 0x01, "Host Read" }, { 0x02, "Host Write" }, { 0x03, "Host Scan Media" }, { 0x04, "Host Inject Poison" }, { 0x05, "Internal Media Scrub" }, { 0x06, "Internal Media Management" }, { 0x07, "Internal Media Error Check Scrub" }, { 0x08, "Media Initialization" } ), REC->channel, REC->rank, REC->device, __print_flags(REC->validity_flags, "|", { ((((1UL))) << (0)), "CHANNEL" }, { ((((1UL))) << (1)), "RANK" }, { ((((1UL))) << (2)), "DEVICE" }, { ((((1UL))) << (3)), "COMPONENT" }, { ((((1UL))) << (4)), "COMPONENT PLDM FORMAT" } ), __print_hex(REC->comp_id, 0x10), REC->hpa, __get_str(region_name), __print_symbolic(REC->sub_type, { 0x00, "Not Reported" }, { 0x01, "Internal Datapath Error" }, { 0x02, "Media Link Command Training Error" }, { 0x03, "Media Link Control Training Error" }, { 0x04, "Media Link Data Training Error" }, { 0x05, "Media Link CRC Error" } ), __print_flags(REC->cme_threshold_ev_flags, "|", { ((((1UL))) << (0)), "Corrected Memory Errors in Multiple Media Components" }, { ((((1UL))) << (1)), "Exceeded Programmable Threshold" } ), REC->cme_count
root@localhost:~#