Re: Kernel 5.3.x, 5.2.2+: VMware player suspend on 64/32 bit guests

From: Woody Suwalski
Date: Wed Aug 28 2019 - 11:18:13 EST


I have tried to "bisect" the config changes, and builds working/not
working between
rc3-rc4-rc5, and come out with the same frustrating result, that
building a "clean" kernel is not producing the same behavoir as
incremental building while bisecting. For some reason even after
getting to the same config step-by-step is not making the kernel work,
similar with actual bisecting.
So for now I simply use my patch to do the timeout.
Thinking of it - should I submit a patch like that to you for
consideration? It may be usefull for other users with the suspend
problems...

Thanks, Woody

On Wed, Aug 21, 2019 at 4:15 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Tue, 20 Aug 2019, Woody Suwalski wrote:
> > On Thu, Aug 15, 2019 at 2:37 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > > On Tue, 13 Aug 2019, Woody Suwalski wrote:
> > > > On Mon, Aug 12, 2019 at 1:24 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > > > > The ACPI handler is not the culprit. This is either an emulation bug or
> > > > > something really strange. Can you please use a WARN_ON() if the loop is
> > > > > exited via the timeout so we can see in which context this happens?
> > > > >
> > > >
> > > > B. On 5.3-rc4 problem is gone. I guess it is overall good sign.
> > >
> > > Now the interesting question is what changed between 5.3-rc3 and
> > > 5.3-rc4. Could you please try to bisect that?
> > >
> >
> > Apparently I can not, and frustrated'ingly do not understand it.
> > Tried twice, and every time I get it broken to the end of bisection -
> > so the fixed-in-5.3-rc4 theory falls apart. Yet if I build cleanly
> > 5.3-rc4 or -rc5, it works OK.
> > Then on a 32 bit system - I first tried with a scaled-down kernel
> > (just with the drivers needed in the VM). That one is never working,
> > even in rc5. Yet the "full" kernel works OK. So now there is a config
> > issue variation on top of other problem?
>
> Looks like and it would be good to know which knob it is.
>
> Can you send me the two configs please?
>
> > > dpm_suspend_noirq() is called with all CPUs online and interrupts
> > > enabled. In that case an interrupt pending in IRR does not make any sense
> > > at all. Confused.
> > >
> > For now I use a timeout counter patch - and it is showing 100% irq9
> > jammed and needing rescue. And I am even more confused...
>
> You're not alone, if that gives you a bit of comfort :)
>
> Thanks,
>
> tglx