nvme: Incorrect Async Event Trace string

From: Nate Thornton
Date: Wed Apr 05 2023 - 16:46:49 EST


Hi Keith,

With NVMe Asynchronous Event kernel debug tracing, there are duplicate entries in the symbolic table that cause the decoded event string to be incorrect for certain events. This is easily observed by enabling SMART / Health events, then lowering the temperature threshold so the drive will generate an event. I observed the issue on the 5.15 kernel and the trace macro remains the same in 6.x

I am using a Samsung PM173X drive in my testing, but any drive that supports asynchronous events and temperature readings should suffice.

e.g.
# Enable NVMe Async Event debug tracing
$ echo 1 > /sys/kernel/debug/tracing/events/nvme/nvme_async_event/enable

# Enable Asynchronous Event Notification for SMART / Critical Health Warnings (bits 0:7)
$ nvme set-feature /dev/nvme0 --feature-id=0xB --value=0xF
set-feature:0x0b (Async Event Configuration), value:0x0000000f, cdw12:00000000, save:0

# Reduce Temperature Threshold to 64 Kelvin, thus triggering an event (unless it's cold)
$ nvme set-feature /dev/nvme0 --feature-id=0x4 --value=0x40

# Observe that the Asynchronous Event Notification (AEN) is 0x000001. This corresponds to a SMART / Health status
# but is labeled as NVME_AER_NOTICE_FW_ACT_STARTING
$ cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:20
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
<idle>-0 [017] d.h.. 745.082580: nvme_async_event: nvme0: NVME_AEN=0x000001 [NVME_AER_NOTICE_FW_ACT_STARTING]



The AER name duplications are in drivers/nvme/host/trace.h where it tries to combine the Asynchronous Event Type, and types specific to the Notice event type. In my case, I should be seeing NVME_AER_SMART.
aer_name(NVME_AER_NOTICE_NS_CHANGED), // 0
aer_name(NVME_AER_NOTICE_ANA), // 3
aer_name(NVME_AER_NOTICE_FW_ACT_STARTING), // 1
aer_name(NVME_AER_NOTICE_DISC_CHANGED), // 0xF0
aer_name(NVME_AER_ERROR), // 0
aer_name(NVME_AER_SMART), // 1
aer_name(NVME_AER_CSS), // 6
aer_name(NVME_AER_VS)) // 7


Referring to NVM Express Base Specification, Revision 2.0c, 5.2.1 Command Completion: The Event Type is in the lower bits 0:3 of Dword 0. The Event Type then describes specific Event Information in bits 8:15. (Note that two Event Types are missing: NVME_AER_NOTICE (2) and NVME_AER_IMMEDIATE (3). The other values, 4 and 5, are marked as Reserved in the spec. For completeness, we may want to add the missing definitions.)

In nvme_handle_aen_notice it is decoding bits 8:15 and passing that into the trace. That maybe explains why the NVME_AER_NOTICE_* entries were added to the symbolic names. If this is still desired, it may require a second TRACE_EVENT definition specific to notice events.

Although I am new to the kernel, I am happy to help with a patch if you can describe the desired behavior.

Thanks,
Nate