On Wed, 2019-09-11 at 12:58 +0100, Linus Torvalds wrote:
And I didn't think about it or double-check, because the errors that
then followed later _looked_ like that TX power failing that I thought
hadn't happened.
Yeah, it could be something already got stuck there, hard to say.
Since we see that something actually did an rfkill operation. Did you
push a button there?
No, I tried to turn off and turn on Wifi manually (no button, just the
settings panel).
That does usually also cause rfkill, so that explains how we got down
this particular code path.
I didn't notice the WARN_ON(), I just noticed that there was no
networking, and "turn it off and on again" is obviously the first
thing to try ;)
:-)
Sep 11 10:27:13 xps13 kernel: WARNING: CPU: 4 PID: 1246 at
net/mac80211/sta_info.c:1057 __sta_info_destroy_part2+0x147/0x150
[mac80211]
but if you want full logs I can send them in private to you.
No, it's fine, though maybe Kalle does - he was stepping out for a while
but said he'd look later.
This is the interesting time - 10:27:13 we get one of the first
failures. Really the first one was this:
Sep 11 10:27:07 xps13 kernel: ath10k_pci 0000:02:00.0: wmi command 16387 timeout, restarting hardware
I do suspect it's atheros and suspend/resume or something. The
wireless clearly worked for a while after the resume, but then at some
point it stopped.
I'm not really sure it's related to suspend/resume at all, the firmware
seems to just have gotten stuck, and the device and firmware most likely
got reset over the suspend/resume anyway.
The only explanation I therefore have is that something is just taking
*forever* in that code path, hence my question about timing information
on the logs.
Yeah, maybe it would time out everything eventually. But not for a
long time. It hadn't cleared up by
Sep 11 10:36:21 xps13 gnome-session-f[6837]: gnome-session-failed:
Fatal IO error 0 (Success) on X server :0.
Ok, that's way longer than I would have guessed even! That's over 9
minutes, that'd be close to 200 commands having to be issued and timing
out ...
I don't know. What I wrote before is basically all I can say, I think
the driver gets stuck somewhere waiting for the device "forever", and
the stack just doesn't get to release the lock, causing all the follow-
up problems.