Re: [PATCH] Bluetooth: Ignore cmd_timeout with HCI_USER_CHANNEL
From: Abhishek Pandit-Subedi
Date: Mon Aug 08 2022 - 17:17:39 EST
On Mon, Aug 8, 2022 at 1:31 PM Luiz Augusto von Dentz
<luiz.dentz@xxxxxxxxx> wrote:
>
> Hi Abhishek,
>
> On Mon, Aug 8, 2022 at 11:04 AM Abhishek Pandit-Subedi
> <abhishekpandit@xxxxxxxxxx> wrote:
> >
> > From: Abhishek Pandit-Subedi <abhishekpandit@xxxxxxxxxxxx>
> >
> > When using HCI_USER_CHANNEL, hci traffic is expected to be handled by
> > userspace entirely. However, it is still possible (and sometimes
> > desirable) to be able to send commands to the controller directly. In
> > such cases, the kernel command timeout shouldn't do any error handling.
> >
> > Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@xxxxxxxxxxxx>
> > ---
> > This was tested by running a userchannel stack and sending commands via
> > hcitool cmd on an Intel AX200 controller. Without this change, each
> > command sent via hcitool would result in hci_cmd_timeout being called
> > and after 5 commands, the controller would reset.
> >
> > Hcitool continues working here because it marks the socket as
> > promiscuous and gets a copy of all traffic while the socket is open (and
> > does filtering in userspace).
>
> There is something not quite right here, if you have a controller
> using user_channel (addr.hci_channel = HCI_CHANNEL_USER) it probably
> shouldn't even accept to be opened again by the likes of hcitool which
> uses HCI_CHANNEL_RAW as it can cause conflicts. If you really need a
> test tool that does send the command while in HCI_CHANNEL_USER then it
> must be send on that mode but I wouldn't do it with hcitool anyway as
> that is deprecated and this exercise seem to revolve to a entire stack
> on top of HCI_USER_CHANNEL then you shall use tools of that stack and
> mix with BlueZ userspace tools.
Our goal is eventually consistent with that aim (not having multiple
users to the socket when using HCI_CHANNEL_USER).
In the interim however, we have existing tooling that expects to be
able to write raw hci to the controller, get responses and not expect
any crashes (Intel Wireless Reporting Tools for example). hcitool is
just an easy test tool here and the real behavior being tested is RAW
channel injections not triggering the cmd timeout.
>
> > Tested on Chromebook with 5.4 kernel with patch (and applied cleanly on
> > bluetooth-next).
> >
> > net/bluetooth/hci_core.c | 26 +++++++++++++++++---------
> > 1 file changed, 17 insertions(+), 9 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index b3a5a3cc9372..c9a15f6633f7 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1481,17 +1481,25 @@ static void hci_cmd_timeout(struct work_struct *work)
> > struct hci_dev *hdev = container_of(work, struct hci_dev,
> > cmd_timer.work);
> >
> > - if (hdev->sent_cmd) {
> > - struct hci_command_hdr *sent = (void *) hdev->sent_cmd->data;
> > - u16 opcode = __le16_to_cpu(sent->opcode);
> > + /* Don't trigger the timeout behavior if it happens while we're in
> > + * userchannel mode. Userspace is responsible for handling any command
> > + * timeouts.
> > + */
> > + if (!(hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > + test_bit(HCI_UP, &hdev->flags))) {
> > + if (hdev->sent_cmd) {
> > + struct hci_command_hdr *sent =
> > + (void *)hdev->sent_cmd->data;
> > + u16 opcode = __le16_to_cpu(sent->opcode);
> >
> > - bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
> > - } else {
> > - bt_dev_err(hdev, "command tx timeout");
> > - }
> > + bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
> > + } else {
> > + bt_dev_err(hdev, "command tx timeout");
> > + }
> >
> > - if (hdev->cmd_timeout)
> > - hdev->cmd_timeout(hdev);
> > + if (hdev->cmd_timeout)
> > + hdev->cmd_timeout(hdev);
> > + }
>
> I wonder why hci_cmd_timeout is even active if the controller is in
> HCI_USER_CHANNEL mode, that sounds like a bug already.
This gets scheduled in hci_cmd_work. I tried not scheduling
hci_cmd_timeout in the first place but that caused the event stream to
hang (I think because subsequent tx work wasn't being scheduled). I
didn't dive very deep here and fix looked complex for a scenario that
we will migrate away from.
>
> > atomic_set(&hdev->cmd_cnt, 1);
> > queue_work(hdev->workqueue, &hdev->cmd_work);
> > --
> > 2.37.1.559.g78731f0fdb-goog
> >
>
>
> --
> Luiz Augusto von Dentz