Re: iwlwifi getting stuck with current Linus' tree (646da63172)

From: Grumbach, Emmanuel
Date: Thu Apr 23 2015 - 05:13:05 EST


On Thu, 2015-04-23 at 10:15 +0200, Jiri Kosina wrote:
> On Thu, 23 Apr 2015, Grumbach, Emmanuel wrote:
>
> > > I've been running current Linus' tree and have been getting system lockups
> > > frequently. After a few "silent" lockups, I was able to obtain a dmesg
> > > before the machine turned dead again (wifi stopped working shortly before
> > > that).
> > >
> > > Before starting to debug / bisect (last known good on this machine is
> > > 4.0-rc6), I am attaching the dmesg in case someone already knows what the
> > > issue is.
> > >
> >
> > I briefly went over the iwlwifi commits between 4.0-rc6 and linux/master
> > and couldn't find anything obvious.
> > Note that for the device you have, the commits that touch
> > drivers/net/wireless/iwlwifi/mvm are not relevant.
> >
> > What you are seeing is that the PCI host is disconnecting the WiFi NIC
> > for some weird reason. It is not the first time I see that, but
> > unfortunately, I have never been able to debug this. I am personally not
> > a HW PCI expert and I couldn't reproduce either...
> >
> > I am afraid I won't save you the time of the bisection, but I am not
> > entirely sure that bisecting the iwlwifi driver is enough to find the
> > commit that broke it. You may want to bisect the pci bus driver as well.
>
> The problem is that I can't really reliably reproduce it; it happens
> rather often, but not so often that I could be certainly sure that my
> distinction of good and bad kernels would be accurate.
>
> I will try it, but I expect the result to be bogus because of this,
> unfortunately.
>

I can understand. A few users reported that this bug occurred more
reliably when moving their system, although it seems very weird to me.

> > First question is: Are you sure that 4.0-rc6 was good?
>
> Pretty much, yes. I've been running it for quite some time on this
> machine without any issues. But after updating to current HEAD two days
> ago, the issue triggered like 6 or 7 times already.
>

Ok - I will try to look at the PCI commits there although I am not sure
I'll be able to make much sense of them...

> Thanks,
>

N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå