Re: [PATCH] wifi: rtw89: retry efuse physical map dump on transient failure

From: Christian Hewitt

Date: Thu Mar 12 2026 - 04:12:50 EST


> On 12 Mar 2026, at 11:39 am, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
>
> Christian Hewitt <christianshewitt@xxxxxxxxx> wrote:
>>> On 12 Mar 2026, at 6:22 am, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
>>>
>>> Christian Hewitt <christianshewitt@xxxxxxxxx> wrote:
>>>>> On 11 Mar 2026, at 7:05 am, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
>>>>>
>>>>> Christian Hewitt <christianshewitt@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> On 9 Mar 2026, at 6:35 am, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Christian Hewitt <christianshewitt@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>>> On 2 Mar 2026, at 10:04 am, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>> Christian Hewitt <christianshewitt@xxxxxxxxx> wrote:
>>>>>>>>>>> On 2 Mar 2026, at 9:47 am, Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Christian Hewitt <christianshewitt@xxxxxxxxx> wrote:
>>>>>>>>>>>> On Radxa Rock 5B with a RTL8852BE combo WiFi/BT card, the efuse
>>>>>>>>>>>> physical map dump intermittently fails with -EBUSY during probe.
>>>>>>>>>>>> The failure occurs in rtw89_dump_physical_efuse_map_ddv() where
>>>>>>>>>>>> read_poll_timeout_atomic() times out waiting for the B_AX_EF_RDY
>>>>>>>>>>>> bit after 1 second.
>>>>>>>>>>>
>>>>>>>>>>> I'm checking internally how we handle this case.
>>>>>>>
>>>>>>> Sorry for the late.
>>>>>>>
>>>>>>> We encountered WiFi/BT reading efuse at the same time causing similar
>>>>>>> problem as yours. The workaround is like yours, which adds timeout
>>>>>>> time.
>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [...]
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For context, firmware also fails (and recovers) sometimes:
>>>>>>>>>>>
>>>>>>>>>>> Did you mean this doesn't always happen? sometimes?
>>>>>>>>>>
>>>>>>>>>> It’s another intermittent behaviour observed on this board (and not
>>>>>>>>>> related to the issue this patch targets). It occurs less frequently
>>>>>>>>>> than the efuse issue and the existing retry mechanism in the driver
>>>>>>>>>> ensures firmware load always succeeds.
>>>>>>>
>>>>>>> This might be the same cause due to reading efuse in firmware.
>>>>>>>
>>>>>>> Though we can add more timeout and retry times as workaround, I wonder
>>>>>>> if you can control loading time of WiFi and BT kernel modules?
>>>>>>>
>>>>>>> More, can you do experiment that you load BT module first, and then load
>>>>>>> WiFi module after 10 seconds (choose a large number intentionally, or
>>>>>>> even larger)?
>>>>>>
>>>>>> https://paste.libreelec.tv/charmed-turkey.sh
>>>>>>
>>>>>> I’ve run the above script ^ which removes the wifi and bt modules in
>>>>>> sequence then reloads them in the reverse order with a delay between
>>>>>> bt and wifi modules loading, then checks for error messages. Over 200
>>>>>> test cycles with a 10s delay all were clean (no errors). I also ran
>>>>>> cycles with a 2 second delay and 0 second delay before starting wifi
>>>>>> module load and those were clear too. I guess that proves sequencing
>>>>>> avoids the efuse contention issue? - although it’s not possible in
>>>>>> the real-world so not sure there’s huge value in knowing that :)
>>>>>
>>>>> Thanks for the experiments.
>>>>>
>>>>> Still want to know is it possible to change sequence/time of loading
>>>>> kernel modules at boot time from system level? I mean can you adjust
>>>>> the sequence in the Rock 5B board?
>>>>
>>>> I’m not a kernel expert, but I’ve always understood module probe and
>>>> load ordering to not be guaranteed; as many things run in parallel and
>>>> are highly subjective to the specific hardware capabilities and kernel
>>>> config being used.
>>>
>>> I have heard people about changing sequence/time of kernel modules, so
>>> I'd like you can try this method.
>>>
>>> I did ask AI, it said it is possible to create a .conf file under
>>> /etc/modprobe.d/ and use `softdep` syntax to ensure loading sequence.
>>> Could you try this?
>>
>> I can test this, but even if it works it’s not a fix because modprobe
>> confs configured in userspace are only used with loadable modules that
>> have been compiled with =m, not build-in modules that are resident in
>> kernel memory and compiled with =y; and distros are free to choose how
>> their kernel is configured. NB: I’m not sure if there are any general
>> kernel rules for this, but I’d expect there to be general principle of
>> modules being resilient to transient host states and not depending on
>> userspace packaging to load correctly?
>
> I think built-in modules will be loaded sequentially (not in parallel)
> by device_initicall(), so BT and WiFi drivers will not read efuse
> at the same time.

Even if built-in modules are loaded sequentially, the kernel still has
many dynamically loaded modules; and distros can configure that mix as
they like, so you still cannot predict or guarantee the outcome. That
could be changed by requiring rtw89 modules to be =y, but that goes
against the principles of a modular kernel and I’d expect appropriately
rude comments to the idea if submitted :)

>>>> In addition, did below messages not appear in these experiments?
>>>>>
>>>>> [ 7.864148] rtw89_8852be 0002:21:00.0: fw security fail
>>>>> [ 7.864154] rtw89_8852be 0002:21:00.0: download firmware fail
>>>>
>>>> No, because even if we have a 0s delay between each group of modules
>>>> being loaded, they are loaded in series, so we workaround the issue.
>>>> Tweaking the script to background the module load loops so both run
>>>> in parallel would be closer to normal conditions, and I would expect
>>>> to start seeing failures and the retry mechanisms within the modules
>>>> (as added in this patch) being triggered.
>>>
>>> Additional question for downloading firmware. As you reported this
>>> issue initially (load modules at boot time in parallel), it seems
>>> appear this message by chance. Since this driver will retry to download
>>> firmware, will it successfully downloads firmware finally? Or it still
>>> fails to download after 5 times retry?
>>
>> I have only seen firmware load fail a handful of times in many hundreds
>> of boots and each time one retry attempt resulted in success. To be
>> clear; I have am not reporting firwmare loading as a problem, it is not
>> an issue for me. I’ve mentioned it only for context, i.e. it shows that
>> a simple retry mechanism is effective at handling the similar issue with
>> efuse map.
>
> I have this question because I wonder downloading firmware issue might be
> also a reading efuse issue. If so, retry might resolve as well.

Hard to know, but it's an infrequent event and the existing retry mechanism appears to work fine.

> As your results, it looks like to retry reading efuse can resolve all
> issues you found. What do you think?

The patch submitted resolves the efuse map dump for me. If there are more
efuse accesses that need to be addressed I haven’t seen them in tests. If
you are hinting to abstract things further I’d ask you to please propose
an alternative patch that I can test for you; I’m firmly at the novice end
of kernel contributors and unlikely to spot where changes might be needed
without being spoon-fed rather explicit instructions :)

Christian