[PATCH 3/3] habanalabs: increase timeout if working with simulator

From: Oded Gabbay
Date: Wed May 01 2019 - 13:45:10 EST


From: Dalit Ben Zoor <dbenzoor@xxxxxxxxx>

Where there is a spike in the CPU consumption, it may cause
random failures in the C/I since the KMD timeout for CPU
and/or QMAN0 jobs expires and it stops communicating to the simulator.
This commit fixes it by increasing timeout on polling functions
if working with simulator.

Signed-off-by: Dalit Ben Zoor <dbenzoor@xxxxxxxxx>
Signed-off-by: Oded Gabbay <oded.gabbay@xxxxxxxxx>
---
drivers/misc/habanalabs/device.c | 8 +++++++-
drivers/misc/habanalabs/habanalabs.h | 7 ++++++-
2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/habanalabs/device.c b/drivers/misc/habanalabs/device.c
index 0e0b9ec71c80..91a9e47a3482 100644
--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@@ -1147,7 +1147,13 @@ int hl_poll_timeout_memory(struct hl_device *hdev, u64 addr,
* either by the direct access of the device or by another core
*/
u32 *paddr = (u32 *) (uintptr_t) addr;
- ktime_t timeout = ktime_add_us(ktime_get(), timeout_us);
+ ktime_t timeout;
+
+ /* timeout should be longer when working with simulator */
+ if (!hdev->pdev)
+ timeout_us *= 10;
+
+ timeout = ktime_add_us(ktime_get(), timeout_us);

might_sleep();

diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h
index 0da80e8eab42..71243b319920 100644
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -1042,7 +1042,12 @@ void hl_wreg(struct hl_device *hdev, u32 reg, u32 val);

#define hl_poll_timeout(hdev, addr, val, cond, sleep_us, timeout_us) \
({ \
- ktime_t __timeout = ktime_add_us(ktime_get(), timeout_us); \
+ ktime_t __timeout; \
+ /* timeout should be longer when working with simulator */ \
+ if (hdev->pdev) \
+ __timeout = ktime_add_us(ktime_get(), timeout_us); \
+ else \
+ __timeout = ktime_add_us(ktime_get(), (timeout_us * 10)); \
might_sleep_if(sleep_us); \
for (;;) { \
(val) = RREG32(addr); \
--
2.17.1