Re: [PATCH v9 2/3] Documentation: add a isolation strategy sysfs node for uacce
From: Greg KH
Date: Tue Oct 25 2022 - 09:05:15 EST
On Tue, Oct 25, 2022 at 12:39:30PM +0000, Kai Ye wrote:
> Update documentation describing sysfs node that could help to
> configure isolation strategy for users in the user space. And
> describing sysfs node that could read the device isolated state.
>
> Signed-off-by: Kai Ye <yekai13@xxxxxxxxxx>
> ---
> Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
> index 08f2591138af..50737c897ba3 100644
> --- a/Documentation/ABI/testing/sysfs-driver-uacce
> +++ b/Documentation/ABI/testing/sysfs-driver-uacce
> @@ -19,6 +19,33 @@ Contact: linux-accelerators@xxxxxxxxxxxxxxxx
> Description: Available instances left of the device
> Return -ENODEV if uacce_ops get_available_instances is not provided
>
> +What: /sys/class/uacce/<dev_name>/isolate_strategy
> +Date: Oct 2022
> +KernelVersion: 6.1
> +Contact: linux-accelerators@xxxxxxxxxxxxxxxx
> +Description: (RW) Configure the frequency size for the hardware error
> + isolation strategy. This unit is the number of times. Number
Number of times what?
> + of occurrences in a period, also means threshold. If the number
> + of device pci AER error exceeds the threshold in a time window,
What is the time window?
> + the device is isolated. This size is a configured integer value.
> + The default is 0. The maximum value is 65535.
> +
> + In the hisilicon accelerator engine, first we will
> + time-stamp every slot AER error. Then check the AER error log
> + when the device AER error occurred. if the device slot AER error
> + count exceeds the preset the number of times in one hour, the
> + isolated state will be set to true. So the device will be
> + isolated. And the AER error log that exceed one hour will be
> + cleared.
This seems like a very hardware-specific implementation here. And this
is supposed to be a generic class?
I feel this is getting really messy :(
thanks,
greg k-h