Re: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout

From: Alexander Holler
Date: Sat May 31 2014 - 02:46:35 EST


Am 31.05.2014 07:28, schrieb Marcel Holtmann:
> Hi Alexander,
>
>> I assume the timeout for processing HCI commands was originally intended to
>> detect hung bluetooth devices and should not include the time needed locally
>> to handle the response to an HCI command. That is important because the time
>> needed locally (by the kernel or even userland) to process responses to HCI
>> commands varies a lot between systems and HCI commands. That's even more true
>> since many actions to HCI command responses are handled inside works which
>> might be delayed quiet some time, depending on the actual system load.
>>
>> So stop the timeout as soon as a response to an HCI command was received.
>>
>> This fixes various problems which resulted in HCI command timeouts and an
>> afterwards non-working bluetooth stack, especially on slower systems like
>> some ARM devices.
>>
>> Drawback is that in-kernel problems like deadlocks aren't detected by HCI
>> command timeouts anymore, but such problems should be detected and handled
>> by other means and not by a timeout where it is hard to specify a value
>> reasonable for all possible systems (-configurations, -loads).
>>
>> Furthermore, if the timeout includes local processing of HCI command
>> responses, in-kernel errors like hung tasks might be masked by the
>> timeout, because the hung task would be killed by the timeout before
>> the hung task would be detected (by other means).
>>
>> Signed-off-by: Alexander Holler <holler@xxxxxxxxxxxxx>
>> ---
>> net/bluetooth/hci_event.c | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>> index 15010a2..94c2dc0 100644
>> --- a/net/bluetooth/hci_event.c
>> +++ b/net/bluetooth/hci_event.c
>> @@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
>>
>> opcode = __le16_to_cpu(ev->opcode);
>>
>> + if (opcode != HCI_OP_NOP)
>> + del_timer(&hdev->cmd_timer);
>> +
>
> so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.

The problem is that I have absolutely no clue where these timeouts do
come from. They appear for different commands and almost always only at
boot. If the machine did come up without hci command timeouts, there
never was one afterwards. So I digged in the dark and the above patch
was one of the results. But I still had to increase the command timeout.
And I can't do much testing as I use this box. I just experience this
problem almost always when I boot it (to do kernel updates) and since a
very long time (more than a year I think).

Regards,

Alexander Holler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/