Re: PROBLEM: Wireless networking goes down on Acer C720P Chromebook (bisected)

From: Toke HÃiland-JÃrgensen
Date: Mon Jan 06 2020 - 08:34:31 EST


Kan Yan <kyan@xxxxxxxxxx> writes:

>> I ran a ping, and saw this:
>>
>> - pings coming back in <5ms
>> - re-enable AQL (echo 7 | tee airtime_flags)
>> - pings stop coming back immediately
>> - some seconds later, disable AQL again (echo 3 | tee airtime_flags)
>> - immediate *flood* of ping replies registered, with times 16000ms,
>> 15000ms, 14000ms, .. down to 1000ms, 15ms, then stabilizing sub-5ms
>> - According to the icmp_seq values, all 28 requests were replied to,
>> and their replies were delivered in-order
>>
>> This certainly looks like a missing TX queue restart to me?
> I don't think TX queue restart is "missing", the TX queue should get
> restarted when the pending frames is completed and returned to the
> host driver. However, It looks like there is some issue with the
> deficit refill logic in ath9k, and the TX queue got blocked due to the
> negative deficit.

s/deficit refill/packet freeing/. I.e., there's an issue with the ath9k
driver either stomping on the tx_time_est field in the cb, or it's not
reporting back all freed TX skbs properly, so the AQL Q depth doesn't go
back down.

The large negative deficit is just because the queue is being blocked by
AQL, so it won't get its deficit refilled (and it keeps decreasing as RX
packets are being accounted).

All this being said, given the fact that ath9k definitely doesn't need
AQL, I think it's probably not worth it to try to find out exactly what
is causing this, and instead just leave AQL off for that driver?

(As an aside, this definitely disproves my initial "AQL should be benign
for drivers that don't need it" hypothesis. Guess that was way too
optimistic anyway ;))

-Toke