Re: [PATCH] Bluetooth: Limit duration of Remote Name Resolve
From: Marcel Holtmann
Date: Thu Oct 28 2021 - 09:30:53 EST
Hi Archie,
> When doing remote name request, we cannot scan. In the normal case it's
> OK since we can expect it to finish within a short amount of time.
> However, there is a possibility to scan lots of devices that
> (1) requires Remote Name Resolve
> (2) is unresponsive to Remote Name Resolve
> When this happens, we are stuck to do Remote Name Resolve until all is
> done before continue scanning.
>
> This patch adds a time limit to stop us spending too long on remote
> name request. The limit is increased for every iteration where we fail
> to complete the RNR in order to eventually solve all names.
>
> Signed-off-by: Archie Pusaka <apusaka@xxxxxxxxxxxx>
> Reviewed-by: Miao-chen Chou <mcchou@xxxxxxxxxxxx>
>
> ---
> Hi maintainers, we found one instance where a test device spends ~90
> seconds to do Remote Name Resolving, hence this patch.
> I think it's better if we reset the time limit to the default value
> at some point, but I don't have a good proposal where to do that, so
> in the end I didn't.
do you have a btmon trace for this as well?
The HCI Remote Name Request is essentially a paging procedure and then a few LMP messages. It is fundamentally a connection request inside BR/EDR and if you have a remote device that has page scan disabled, but inquiry scan enabled, then you get into this funky situation. Sadly, the BR/EDR parts don’t give you any hint on this weird combination. You can't configure BlueZ that way since it is really stupid setup and I remember that GAP doesn’t have this case either, but it can happen. So we might want to check if that is what happens. And of course it needs to be a Bluetooth 2.0 device or a device that doesn’t support Secure Simple Pairing. There is a chance of really bad radio interference, but that is then just bad luck and is only going to happen every once in a blue moon.
That said, you should receive a Page Timeout in the Remote Name Request Complete event for what you describe. Or you just use HCI Remote Name Request Cancel to abort the paging. If I remember correctly then the setting for Page Timeout is also applied to Remote Name resolving procedure. So we could tweak that value. Actually once we get the “sync” work merged, we could configure different Page Timeout for connection requests and name resolving if that would help. Not sure if this is worth it, since we could as simple just cancel the request.
> include/net/bluetooth/hci_core.h | 5 +++++
> net/bluetooth/hci_event.c | 12 ++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index dd8840e70e25..df9ffedf1d29 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -87,6 +87,8 @@ struct discovery_state {
> u8 (*uuids)[16];
> unsigned long scan_start;
> unsigned long scan_duration;
> + unsigned long name_resolve_timeout;
> + unsigned long name_resolve_duration;
> };
>
> #define SUSPEND_NOTIFIER_TIMEOUT msecs_to_jiffies(2000) /* 2 seconds */
> @@ -805,6 +807,8 @@ static inline void sco_recv_scodata(struct hci_conn *hcon, struct sk_buff *skb)
> #define INQUIRY_CACHE_AGE_MAX (HZ*30) /* 30 seconds */
> #define INQUIRY_ENTRY_AGE_MAX (HZ*60) /* 60 seconds */
>
> +#define NAME_RESOLVE_INIT_DURATION 5120 /* msec */
> +
> static inline void discovery_init(struct hci_dev *hdev)
> {
> hdev->discovery.state = DISCOVERY_STOPPED;
> @@ -813,6 +817,7 @@ static inline void discovery_init(struct hci_dev *hdev)
> INIT_LIST_HEAD(&hdev->discovery.resolve);
> hdev->discovery.report_invalid_rssi = true;
> hdev->discovery.rssi = HCI_RSSI_INVALID;
> + hdev->discovery.name_resolve_duration = NAME_RESOLVE_INIT_DURATION;
> }
>
> static inline void hci_discovery_filter_clear(struct hci_dev *hdev)
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 3cba2bbefcd6..104a1308f454 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -2086,6 +2086,15 @@ static bool hci_resolve_next_name(struct hci_dev *hdev)
> if (list_empty(&discov->resolve))
> return false;
>
> + /* We should stop if we already spent too much time resolving names.
> + * However, double the name resolve duration for the next iterations.
> + */
> + if (time_after(jiffies, discov->name_resolve_timeout)) {
> + bt_dev_dbg(hdev, "Name resolve takes too long, stopping.");
> + discov->name_resolve_duration *= 2;
> + return false;
> + }
> +
> e = hci_inquiry_cache_lookup_resolve(hdev, BDADDR_ANY, NAME_NEEDED);
> if (!e)
> return false;
> @@ -2634,6 +2643,9 @@ static void hci_inquiry_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
> if (e && hci_resolve_name(hdev, e) == 0) {
> e->name_state = NAME_PENDING;
> hci_discovery_set_state(hdev, DISCOVERY_RESOLVING);
> +
> + discov->name_resolve_timeout = jiffies +
> + msecs_to_jiffies(discov->name_resolve_duration);
So if this is really caused by a device with page scan disabled and inquiry scan enabled, then this fix is just a paper over hole approach. If you have more devices requiring name resolving, you end up penalizing them and make the discovery procedure worse up to the extend that no names are resolved. So I wouldn’t be in favor of this.
What LE scan window/interval is actually working against what configured BR/EDR page timeout here? The discovery procedure is something that a user triggers so we always had that one take higher priority since the user is expecting results. This means any tweaking needs to be considered carefully since it is an immediate user impact if the name is missing.
Is this a LE background scan you are worried about or an LE active scan that runs in parallel during discovery?
Regards
Marcel