Re: nvme: Incorrect Async Event Trace string

From: Keith Busch
Date: Wed Apr 05 2023 - 17:30:58 EST


On Wed, Apr 05, 2023 at 08:46:36PM +0000, Nate Thornton wrote:
> Hi Keith,
>
> With NVMe Asynchronous Event kernel debug tracing, there are duplicate entries in the symbolic table that cause the decoded event string to be incorrect for certain events. This is easily observed by enabling SMART / Health events, then lowering the temperature threshold so the drive will generate an event. I observed the issue on the 5.15 kernel and the trace macro remains the same in 6.x
>
> I am using a Samsung PM173X drive in my testing, but any drive that supports asynchronous events and temperature readings should suffice.
>
> e.g.
> # Enable NVMe Async Event debug tracing
> $ echo 1 > /sys/kernel/debug/tracing/events/nvme/nvme_async_event/enable
>
> # Enable Asynchronous Event Notification for SMART / Critical Health Warnings (bits 0:7)
> $ nvme set-feature /dev/nvme0 --feature-id=0xB --value=0xF
> set-feature:0x0b (Async Event Configuration), value:0x0000000f, cdw12:00000000, save:0
>
> # Reduce Temperature Threshold to 64 Kelvin, thus triggering an event (unless it's cold)
> $ nvme set-feature /dev/nvme0 --feature-id=0x4 --value=0x40
>
> # Observe that the Asynchronous Event Notification (AEN) is 0x000001. This corresponds to a SMART / Health status
> # but is labeled as NVME_AER_NOTICE_FW_ACT_STARTING
> $ cat /sys/kernel/debug/tracing/trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 1/1 #P:20
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / _-=> migrate-disable
> # |||| / delay
> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> # | | | ||||| | |
> <idle>-0 [017] d.h.. 745.082580: nvme_async_event: nvme0: NVME_AEN=0x000001 [NVME_AER_NOTICE_FW_ACT_STARTING]
>
>
>
> The AER name duplications are in drivers/nvme/host/trace.h where it tries to combine the Asynchronous Event Type, and types specific to the Notice event type. In my case, I should be seeing NVME_AER_SMART.
> aer_name(NVME_AER_NOTICE_NS_CHANGED), // 0
> aer_name(NVME_AER_NOTICE_ANA), // 3
> aer_name(NVME_AER_NOTICE_FW_ACT_STARTING), // 1
> aer_name(NVME_AER_NOTICE_DISC_CHANGED), // 0xF0
> aer_name(NVME_AER_ERROR), // 0
> aer_name(NVME_AER_SMART), // 1
> aer_name(NVME_AER_CSS), // 6
> aer_name(NVME_AER_VS)) // 7
>
>
> Referring to NVM Express Base Specification, Revision 2.0c, 5.2.1 Command Completion: The Event Type is in the lower bits 0:3 of Dword 0. The Event Type then describes specific Event Information in bits 8:15. (Note that two Event Types are missing: NVME_AER_NOTICE (2) and NVME_AER_IMMEDIATE (3). The other values, 4 and 5, are marked as Reserved in the spec. For completeness, we may want to add the missing definitions.)
>
> In nvme_handle_aen_notice it is decoding bits 8:15 and passing that into the trace. That maybe explains why the NVME_AER_NOTICE_* entries were added to the symbolic names. If this is still desired, it may require a second TRACE_EVENT definition specific to notice events.
>
> Although I am new to the kernel, I am happy to help with a patch if you can describe the desired behavior.

Thanks for the notice, you are correct that the tracing is wrong. I'll get a
fix together shortly. Must have been a mistake to mix the Event Info along with
the Event Type decoding.