Re: [PATCH] rt2x00: fix rx queue hang

From: Stanislaw Gruszka
Date: Tue Jun 18 2019 - 05:39:42 EST


On Mon, Jun 17, 2019 at 11:46:56AM +0200, Soeren Moch wrote:
> Since commit ed194d136769 ("usb: core: remove local_irq_save() around
> ->complete() handler") the handlers rt2x00usb_interrupt_rxdone() and
> rt2x00usb_interrupt_txdone() are not running with interrupts disabled
> anymore. So these handlers are not guaranteed to run completely before
> workqueue processing starts. So only mark entries ready for workqueue
> processing after proper accounting in the dma done queue.

It was always the case on SMP machines that rt2x00usb_interrupt_{tx/rx}done
can run concurrently with rt2x00_work_{rx,tx}done, so I do not
understand how removing local_irq_save() around complete handler broke

Have you reverted commit ed194d136769 and the revert does solve the problem ?

Between 4.19 and 4.20 we have some quite big changes in rt2x00 driver:

0240564430c0 rt2800: flush and txstatus rework for rt2800mmio
adf26a356f13 rt2x00: use different txstatus timeouts when flushing
5022efb50f62 rt2x00: do not check for txstatus timeout every time on tasklet
0b0d556e0ebb rt2800mmio: use txdone/txstatus routines from lib
5c656c71b1bf rt2800: move usb specific txdone/txstatus routines to rt2800lib

so I'm a bit afraid that one of those changes is real cause of
the issue not ed194d136769 .

> Note that rt2x00usb_work_rxdone() processes all available entries, not
> only such for which queue_work() was called.
> This fixes a regression on a RT5370 based wifi stick in AP mode, which
> suddenly stopped data transmission after some period of heavy load. Also
> stopping the hanging hostapd resulted in the error message "ieee80211
> phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
> Other operation modes are probably affected as well, this just was
> the used testcase.

Do you know what actually make the traffic stop,
TX queue hung or RX queue hung?

> diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
> index 1b08b01db27b..9c102a501ee6 100644
> --- a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
> +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
> @@ -263,9 +263,9 @@ EXPORT_SYMBOL_GPL(rt2x00lib_dmastart);
> void rt2x00lib_dmadone(struct queue_entry *entry)
> {
> - set_bit(ENTRY_DATA_STATUS_PENDING, &entry->flags);
> clear_bit(ENTRY_OWNER_DEVICE_DATA, &entry->flags);
> rt2x00queue_index_inc(entry, Q_INDEX_DMA_DONE);
> + set_bit(ENTRY_DATA_STATUS_PENDING, &entry->flags);

Unfortunately I do not understand how this suppose to fix the problem,
could you elaborate more about this change?