Re: [PATCH v3] Bluetooth: hci_qca: Bug fixes while collecting controller memory dump
From: gubbaven
Date:  Fri Feb 14 2020 - 06:48:29 EST
Hi Stephen,
On 2020-02-14 07:52, Stephen Boyd wrote:
Quoting Venkata Lakshmi Narayana Gubba (2020-02-13 07:56:04)
This patch will fix the below issues
   1.Fixed race conditions while accessing memory dump state flags.
What sort of race condition?
[Venkat]:
To avoid race condition between qca_hw_error() and 
qca_controller_memdump() while accessing memory buffer, mutex is added.
In timeout scenario, qca_hw_error() frees memory dump buffers and 
qca_controller_memdump() might still access same memory buffers.
We can avoid this situation by using mutex.
   2.Updated with actual context of timer in hci_memdump_timeout()
What does this mean?
[Venkat]:
I will update commit text and post in next patch set.
   3.Updated injecting hardware error event if the dumps failed to 
receive.
   4.Once timeout is triggered, stopping the memory dump collections.
Possible scenarios while collecting memory dump:
Scenario 1:
Memdump event from firmware
Some number of memdump events with seq #
Hw error event
Reset
Scenario 2:
Memdump event from firmware
Some number of memdump events with seq #
Timeout schedules hw_error_event if hw error event is not received 
already
hw_error_event clears the memdump activity
reset
Scenario 3:
hw_error_event sends memdump command to firmware and waits for 
completion
Some number of memdump events with seq #
hw error event
reset
Fixes: d841502c79e3 ("Bluetooth: hci_qca: Collect controller memory 
dump during SSR")
Reported-by: Abhishek Pandit-Subedi <abhishekpandit@xxxxxxxxxxxx>
Signed-off-by: Venkata Lakshmi Narayana Gubba 
<gubbaven@xxxxxxxxxxxxxx>
---
[...]
@@ -1449,6 +1465,23 @@ static void qca_hw_error(struct hci_dev *hdev, 
u8 code)
                bt_dev_info(hdev, "waiting for dump to complete");
                qca_wait_for_dump_collection(hdev);
        }
+
+       if (qca->memdump_state != QCA_MEMDUMP_COLLECTED) {
+               bt_dev_err(hu->hdev, "clearing allocated memory due to 
memdump timeout");
+               mutex_lock(&qca->hci_memdump_lock);
Why is a mutex needed? Are crashes happening in parallel? It would be
nice if the commit text mentioned why the mutex is added so that the
reader doesn't have to figure it out.
[Venkat]:Explained in above answer.
+               if (qca_memdump)
+                       memdump_buf = qca_memdump->memdump_buf_head;
Regards,
Lakshmi Narayana.