I'm also planning to provide some more debug patches, to figuring outI can't point to any specific series of events where it would go wrong, but I suspect that the problem might be the fact that you're doing tx scheduling from within ieee80211_handle_wake_tx_queue. I don't see how it's properly protected from potentially being called on different CPUs concurrently.
which part of commit 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs
for resumption") fixes the issue for you. Assuming my understanding
above is correct the patch should not really fix/break anything for
you...With the findings above I would have expected your git bisec to
identify commit a790cc3a4fad ("wifi: mac80211: add wake_tx_queue
callback to drivers") as the first broken commit...
Back when I was debugging some iTXQ issues in mt76, I also had problems when tx scheduling could happen from multiple places. My solution was to have a single worker thread that handles tx, which is scheduled from the wake_tx_queue op.
Maybe you could do something similar in mac80211 for non-iTXQ drivers.