[PATCH 01/11] habanalabs/gaudi: trigger state dump in case of SM errors
From: Oded Gabbay
Date: Tue Jul 13 2021 - 03:52:40 EST
From: Ofir Bitton <obitton@xxxxxxxxx>
State dump is relevant to the user in case of Sync Manager error, so
we need to trigger it in that case as well.
Signed-off-by: Ofir Bitton <obitton@xxxxxxxxx>
Reviewed-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
Signed-off-by: Oded Gabbay <ogabbay@xxxxxxxxxx>
---
drivers/misc/habanalabs/gaudi/gaudi.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index fdbe8155ef3c..6cbedeee15d1 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -7894,8 +7894,9 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
u32 ctl = le32_to_cpu(eq_entry->hdr.ctl);
u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK)
>> EQ_CTL_EVENT_TYPE_SHIFT);
- u8 cause;
bool reset_required;
+ u8 cause;
+ int rc;
gaudi->events_stat[event_type]++;
gaudi->events_stat_aggregate[event_type]++;
@@ -8081,6 +8082,10 @@ static void gaudi_handle_eqe(struct hl_device *hdev,
gaudi_print_irq_info(hdev, event_type, false);
gaudi_print_sm_sei_info(hdev, event_type,
&eq_entry->sm_sei_data);
+ rc = hl_state_dump(hdev);
+ if (rc)
+ dev_err(hdev->dev,
+ "Error during system state dump %d\n", rc);
hl_fw_unmask_irq(hdev, event_type);
break;
--
2.25.1