Re: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")

From: Paul E. McKenney
Date: Wed Nov 14 2018 - 15:20:25 EST


On Tue, Nov 13, 2018 at 07:10:37AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 13, 2018 at 03:54:53PM +0200, Ville Syrjälä wrote:
> > Hi Paul,
> >
> > After 4.20-rc1 some of my 32bit UP machines no longer reboot/shutdown.
> > I bisected this down to commit 45975c7d21a1 ("rcu: Define RCU-sched
> > API in terms of RCU for Tree RCU PREEMPT builds").
> >
> > I traced the hang into
> > -> cpufreq_suspend()
> > -> cpufreq_stop_governor()
> > -> cpufreq_dbs_governor_stop()
> > -> gov_clear_update_util()
> > -> synchronize_sched()
> > -> synchronize_rcu()
> >
> > Only PREEMPT=y is affected for obvious reasons, but that couldn't
> > explain why the same UP kernel booted on an SMP machine worked fine.
> > Eventually I realized that the difference between working and
> > non-working machine was IOAPIC vs. PIC. With initcall_debug I saw
> > that we mask everything in the PIC before cpufreq is shut down,
> > and came up with the following fix:
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 7aa3dcad2175..f88bf3c77fc0 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void)
> > return 0;
> > }
> > module_param(off, int, 0444);
> > -core_initcall(cpufreq_core_init);
> > +late_initcall(cpufreq_core_init);
>
> Thank you for testing this and tracking it down!
>
> I am glad that you have a fix, but I hope that we can arrive at a less
> constraining one.
>
> > Here's the resulting change in inutcall_debug:
> > pci 0000:00:00.1: shutdown
> > hub 4-0:1.0: hub_ext_port_status failed (err = -110)
> > agpgart-intel 0000:00:00.0: shutdown
> > + PM: Calling cpufreq_suspend+0x0/0x100
> > PM: Calling mce_syscore_shutdown+0x0/0x10
> > PM: Calling i8259A_shutdown+0x0/0x10
> > - PM: Calling cpufreq_suspend+0x0/0x100
> > + reboot: Restarting system
> > + reboot: machine restart
> >
> > I didn't really look into what other ramifications the cpufreq
> > initcall change might have. cpufreq_global_kobject worries
> > me a bit. Maybe that one has to remain in core_initcall() and
> > we could just move the suspend to late_initcall()? Anyways,
> > I figured I'd leave this for someone more familiar with the
> > code to figure out ;)
>
> Let me guess...
>
> When the system suspends or shuts down, there comes a point after which
> there is only a single CPU that is running with preemption and interrupts
> are disabled. At this point, RCU must change the way that it works, and
> the commit you bisected to would make the change more necessary. But if
> I am guessing correctly, we have just been getting lucky in the past.
>
> It looks like RCU needs to create a struct syscore_ops with a shutdown
> function and pass this to register_syscore_ops(). Maybe a suspend
> function as well. And RCU needs to invoke register_syscore_ops() at
> a time that causes RCU's shutdown function to be invoked in the right
> order with respect to the other work in flight. The hope would be that
> RCU's suspend function gets called just as the system transitions into
> a mode where the scheduler is no longer active, give or take.
>
> Does this make sense, or am I confused?

Well, it certainly does not make sense in that blocking is still legal
at .shutdown() invocation time, which means that RCU cannot revert to
its boot-time approach at that point. Looks like I need hooks in a
bunch of arch-dependent functions. Which is certainly doable, but will
take a bit more digging.

Thanx, Paul