Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot

From: Frederic Weisbecker
Date: Fri Feb 24 2017 - 22:29:06 EST


On Thu, Feb 23, 2017 at 07:40:13PM +0100, Pavel Machek wrote:
> On Thu 2017-02-23 17:28:26, Frederic Weisbecker wrote:
> > On Tue, Feb 14, 2017 at 08:27:43PM +0100, Pavel Machek wrote:
> > > On Tue 2017-02-14 18:59:56, Pavel Machek wrote:
> > > > Hi!
> > > >
> > > > > > > > Hmm. I moved keyboard between USB ports, and now 4.10-rc6 no longer
> > > > > > > > boots. v4.6 works ok. Let me try with keyboard unplugged... no, I
> > > > > > > > could not get it to work. I believe v4.9 and some v4.10-rc's worked,
> > > > > > > > but I'll have to double check.
> > > > > > >
> > > > > > > But all the kernel versions worked when the keyboard was plugged into
> > > > > > > its original USB port?
> > > > > >
> > > > > > Aha. So it looks difference is probably in "where is keyboard plugged
> > > > > > in" but in "reboot" vs. "cold boot". I did not do a cold boot in quite
> > > > > > a while :-(.
> > > > > >
> > > > > > Booting to grub, then hitting ctrl-alt-del is enough to make it work. Ouch.
> > > > > >
> > > > > > It happens with current Linus' tree.
> > > > >
> > > > > v4.10-rc6-feb3 : broken
> > > > > v4.9 : ok
> > > > > (v4.6 : ok)
> > > >
> > > > Hmm. It hangs during PCI fixups, and it hangs in v4.10-rc8, too.
> > > >
> > > > With debug patch below, I get
> > > >
> > > > ...1d.7: PCI fixup... pass 2
> > > > ...1d.7: PCI fixup... pass 3
> > > > ...1d.7: PCI fixup... pass 3 done
> > > >
> > > > ...followed by hang. So yes, it looks USB related.
> > > >
> > > > (Sometimes it hangs with some kind backtrace involving secondary CPU
> > > > startup, unfortunately useful info is off screen at that point).
> > >
> > > Forgot to say, 1d.7 is EHCI controller.
> > >
> > > 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI
> > > Controller (rev 01)
> >
> > Ok, I should have access soon to a EeePc 1015CX (which seem to have this controller).
> > I hope I'll be able to reproduce the issue there. If not, I'm sorry but I'll have to
> > burden you again :-)
>
> Go through more mails.

I've read the whole thread several times, I couldn't get much more clues.

> It is only reproducible after cold boot. .. so I doubt it will be easy to reproduce on another machine.

I have no idea. That's just my only hope for now.

>
> Now... I do have serial port, and I even might have serial cable
> somewhere, but.... Giving how sensitive it is, it is probably going to
> go away with console on ttyS...

We'll see how it goes. I'll be off next week and then I should get the eeepc.
I'll get back to it there.

What gets me surprised is that the tick doesn't even fire yet on pci quirks time,
at least not on my machine where the clocksource is setup afterward. That said if
some of the pci quirks are async works, it might explain some later relation with the tick.

Thanks.