[PATCH 07/15] habanalabs: add an option to control watchdog timeout via debugfs

From: Oded Gabbay
Date: Thu Oct 27 2022 - 05:11:11 EST


From: Tomer Tayar <ttayar@xxxxxxxxx>

Add an option to control the timeout value for the driver's watchdog
of the reset process. The timeout represents the amount of the user
has to close his process once he gets a device reset notification from
the driver.

Signed-off-by: Tomer Tayar <ttayar@xxxxxxxxx>
Reviewed-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
Signed-off-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
---
Documentation/ABI/testing/debugfs-driver-habanalabs | 7 +++++++
drivers/misc/habanalabs/common/debugfs.c | 5 +++++
2 files changed, 12 insertions(+)

diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs
index c915bf17b293..85f6d04f528b 100644
--- a/Documentation/ABI/testing/debugfs-driver-habanalabs
+++ b/Documentation/ABI/testing/debugfs-driver-habanalabs
@@ -91,6 +91,13 @@ Description: Enables the root user to set the device to specific state.
Valid values are "disable", "enable", "suspend", "resume".
User can read this property to see the valid values

+What: /sys/kernel/debug/habanalabs/hl<n>/device_release_watchdog_timeout
+Date: Oct 2022
+KernelVersion: 6.2
+Contact: ttayar@xxxxxxxxx
+Description: The watchdog timeout value in seconds for a device relese upon
+ certain error cases, after which the device is reset.
+
What: /sys/kernel/debug/habanalabs/hl<n>/dma_size
Date: Apr 2021
KernelVersion: 5.13
diff --git a/drivers/misc/habanalabs/common/debugfs.c b/drivers/misc/habanalabs/common/debugfs.c
index 48d3ec8b5c82..945c0e6758ca 100644
--- a/drivers/misc/habanalabs/common/debugfs.c
+++ b/drivers/misc/habanalabs/common/debugfs.c
@@ -1769,6 +1769,11 @@ void hl_debugfs_add_device(struct hl_device *hdev)
dev_entry,
&hl_timeout_locked_fops);

+ debugfs_create_u32("device_release_watchdog_timeout",
+ 0644,
+ dev_entry->root,
+ &hdev->device_release_watchdog_timeout_sec);
+
for (i = 0, entry = dev_entry->entry_arr ; i < count ; i++, entry++) {
debugfs_create_file(hl_debugfs_list[i].name,
0444,
--
2.25.1