Re: [PATCH 2/2] PCI/AER: Split the AER stats into multiple sysfs attributes

From: Greg KH
Date: Fri Aug 23 2019 - 13:45:39 EST


On Wed, Aug 21, 2019 at 04:15:13PM -0700, Rajat Jain wrote:
> Split the AER stats into multiple sysfs atributes. Note that
> this changes the ABI of the AER stats, but hopefully, there
> aren't active users that need to change. This is how the AERs
> are being exposed now:
>
> localhost /sys/devices/pci0000:00/0000:00:1c.0/aer_stats # ls -l
> total 0
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit0_RxErr
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit12_Timeout
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit13_NonFatalErr
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit14_CorrIntErr
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit15_HeaderOF
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit6_BadTLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit7_BadDLLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 correctable_bit8_Rollover
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit0_Undefined
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit12_TLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit13_FCP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit14_CmpltTO
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit15_CmpltAbrt
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit16_UnxCmplt
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit17_RxOF
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit18_MalfTLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit19_ECRC
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit20_UnsupReq
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit21_ACSViol
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit22_UncorrIntErr
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit23_BlockedTLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit24_AtomicOpBlocked
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit25_TLPBlockedErr
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit26_PoisonTLPBlocked
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit4_DLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 fatal_bit5_SDES
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit0_Undefined
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit12_TLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit13_FCP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit14_CmpltTO
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit15_CmpltAbrt
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit16_UnxCmplt
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit17_RxOF
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit18_MalfTLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit19_ECRC
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit20_UnsupReq
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit21_ACSViol
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit22_UncorrIntErr
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit23_BlockedTLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit24_AtomicOpBlocked
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit25_TLPBlockedErr
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit26_PoisonTLPBlocked
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit4_DLP
> -r--r--r--. 1 root root 4096 Aug 20 16:35 nonfatal_bit5_SDES
> -r--r--r--. 1 root root 4096 Aug 20 16:35 total_device_err_cor
> -r--r--r--. 1 root root 4096 Aug 20 16:35 total_device_err_fatal
> -r--r--r--. 1 root root 4096 Aug 20 16:35 total_device_err_nonfatal
> -r--r--r--. 1 root root 4096 Aug 20 16:35 total_rootport_err_cor
> -r--r--r--. 1 root root 4096 Aug 20 16:35 total_rootport_err_fatal
> -r--r--r--. 1 root root 4096 Aug 20 16:35 total_rootport_err_nonfatal
> localhost /sys/devices/pci0000:00/0000:00:1c.0/aer_stats #
>
> Each file is has a single counter value. Single file containing all
> stats was frowned upon and discussed here:
> https://lkml.org/lkml/2019/6/28/220
>
> Signed-off-by: Rajat Jain <rajatja@xxxxxxxxxx>
> ---
> I personally think that this makes it a little overwhelming for a human,
> e.g. I could look at total but don't exactly know while file to look at
> next in order to drill down. But I couldn't think of any other way. Some
> problems I'd have liked to fix but they require deeper surgery:
>
> * Now each PCI device sysfs node will have a sub-directory called aer_stats.
> (The subdirectory will have attributes only if it supports AER, but
> the sub directory will always be present).
>
> * This patch isn't re-using the strings array like it was using earlier.
> I thought of adding the attribute group at run time, so it will take
> care of both the problems, but can only do that after device_add() call,
> I think.
>
> If we are comfortable introducing a call to a new function
> pci_aer_stats_init() after call to device_add() in in pci_device_add(),
> the above problems can be fixed.
>
> drivers/pci/pcie/aer.c | 166 +++++++++++++++++++++++++++++------------
> 1 file changed, 119 insertions(+), 47 deletions(-)

You need a Documentation/ABI/ update for the new sysfs files before we
can properly review this to see if you are doing what you think you are
doing :)

thanks,

greg k-h