Re: [PATCH] wifi: rtw88: reduce failed to flush queue severity

From: Lewis Robbins
Date: Mon Apr 15 2024 - 19:31:19 EST


Ping-Ke Shih <pkshih@xxxxxxxxxxx> writes:

> Lewis Robbins <lewis.robbins2@xxxxxxxxx> wrote:
>>
>> Reduce the log message severity when we fail to flush device priority
>> queue. If a system has a lot of traffic, we may fail to flush the queue
>> in time. This generates a lot of messages in the kernel ring buffer. As
>> this is a common occurrence, we should use dev_info instead of dev_warn.
>>
>> Signed-off-by: Lewis Robbins <lewis.robbins2@xxxxxxxxx>
>
> Acked-by: Ping-Ke Shih <pkshih@xxxxxxxxxxx>
>
> I'd like to know situations of " If a system has a lot of traffic...".
> Did you scan or do something during traffic?

So, after digging a bit more, it seems you're right this only happens during a
scan. The log message itself is repeated about 5-10x.

Kalle Valo <kvalo@xxxxxxxxxx> writes:

> The driver shouldn't print any warnings in normal usage, even using info
> level. If this is expected scenario then maybe change it to debug print?
> Or if is this an actual bug then it's better try to investigate and fix
> it.

I have the stack-trace:

[23838.633664] rtw_8821ce 0000:02:00.0: timed out to flush queue 2
[23838.633685] CPU: 1 PID: 363059 Comm: kworker/u8:1 Tainted G 6.8.5
[23838.633698] Hardware name: /, BIOS 5.26 09/26/2023
[23838.633704] Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
[23838.633881] Call Trace:
[23838.633889] <TASK>
[23838.633898] dump_stack_lvl+0x47/0x60
[23838.633918] rtw_mac_flush_queues+0x148/0x190 [rtw88_core 0d7ad2d9d6116c633c0aab4e7bc6016d572d75d4]
[23838.633993] rtw_ops_flush+0x5a/0x70 [rtw88_core 0d7ad2d9d6116c633c0aab4e7bc6016d572d75d4]
[23838.634056] __ieee80211_flush_queues+0x10b/0x2e0 [mac80211 5d0b446baffe1290bc56d55aa496e941688b7b40]
[23838.634309] ieee80211_scan_work+0x3e3/0x520 [mac80211 5d0b446baffe1290bc56d55aa496e941688b7b40]
[23838.634494] cfg80211_wiphy_work+0xa7/0xe0 [cfg80211 b36d5437ba649ace42ea92e8f83a3ec499e0d5b7]
[23838.634646] process_one_work+0x178/0x350
[23838.634660] worker_thread+0x30f/0x450
[23838.634670] ? __pfx_worker_thread+0x10/0x10
[23838.634678] kthread+0xe5/0x120
[23838.634691] ? __pfx_kthread+0x10/0x10
[23838.634702] ret_from_fork+0x31/0x50
[23838.634714] ? __pfx_kthread+0x10/0x10
[23838.634724] ret_from_fork_asm+0x1b/0x30
[23838.634736] </TASK>

I'm not sure as to the cause. If the flush operation takes a long time do we
need to release any mutexes etc? And if this is just a hardware issue, then we
can do a debug print as you say.

BugZilla: https://bugzilla.kernel.org/show_bug.cgi?id=218697