Re: [RFC PATCH] PCI: pciehp: Generate a RAS tracepoint for hotplug event

From: Lukas Wunner
Date: Sat Nov 09 2024 - 12:53:08 EST


On Fri, Nov 08, 2024 at 11:09:39AM +0800, Shuai Xue wrote:
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -19,6 +19,7 @@
> #include <linux/types.h>
> #include <linux/pm_runtime.h>
> #include <linux/pci.h>
> +#include <ras/ras_event.h>
> #include "pciehp.h"

Hm, why does the TRACE_EVENT() definition have to live in ras_event.h?
Why not, say, in pciehp.h?


> @@ -245,6 +246,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> if (events & PCI_EXP_SLTSTA_PDC)
> ctrl_info(ctrl, "Slot(%s): Card not present\n",
> slot_name(ctrl));
> + trace_pciehp_event(dev_name(&ctrl->pcie->port->dev),
> + slot_name(ctrl), ON_STATE, events);
> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
> break;
> default:

I'd suggest using pci_name() instead of dev_name() as it's a little shorter.

Passing ON_STATE here isn't always accurate because there's
"case BLINKINGOFF_STATE" with a fallthrough preceding the
above code block.

Wouldn't it be more readable to just log the event that occured
as a string, e.g. "Surprise Removal" (and "Insertion" or "Hot Add"
for the other trace event you're introducing) instead of the state?

Otherwise you see "ON_STATE" in the log but that's actually the
*old* value so you have to mentally convert this to "previously ON,
so now must be transitioning to OFF".

I'm fine with adding trace points to pciehp, I just want to make sure
we do it in a way that's easy to parse for admins.

Thanks,

Lukas