Re: [PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline

From: Zhouyi Zhou
Date: Mon Oct 10 2022 - 00:25:39 EST


On Mon, Oct 10, 2022 at 11:49 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>
> On Thu Sep 29, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > On Wed, Sep 28, 2022 at 10:51 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
> > >
> > > On Wed Sep 28, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > > > Thank Nick for reviewing my patch
> > > >
> > > > On Tue, Sep 27, 2022 at 12:25 PM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
> > > > >
> > > > > On Tue Sep 27, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > > > > > This is second version of my fix to PPC's "WARNING: suspicious RCU usage",
> > > > > > I improved my fix under Paul E. McKenney's guidance:
> > > > > > Link: https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzhouyi@xxxxxxxxx/T/
> > > > > >
> > > > > > During the cpu offlining, the sub functions of xive_teardown_cpu will
> > > > > > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> > > > > > travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> > > > > > triggered.
> > > > > >
> > > > > > Avoid lockdep when we are offline.
> > > > >
> > > > > I don't see how this is safe. If RCU is no longer watching the CPU then
> > > > > the memory it is accessing here could be concurrently freed. I think the
> > > > > warning is valid.
> > > > Agree
> > > > >
> > > > > powerpc's problem is that cpuhp_report_idle_dead() is called before
> > > > > arch_cpu_idle_dead(), so it must not rely on any RCU protection there.
> > > > > I would say xive cleanup just needs to be done earlier. I wonder why it
> > > > > is not done in __cpu_disable or thereabouts, that's where the interrupt
> > > > > controller is supposed to be stopped.
> > > > Yes, I learn flowing events sequence from kgdb debugging
> > > > __cpu_disable -> pseries_cpu_disable -> set_cpu_online(cpu, false) =
> > > > leads to => do_idle: if (cpu_is_offline(cpu) -> arch_cpu_idle_dead
> > > > so xive cleanup should be done in pseries_cpu_disable.
> > >
> > > It's a good catch and a reasonable approach to the problem.
> > Thank Nick for your encouragement ;-)
> > >
> > > > But as a beginner, I afraid that I am incompetent to do above
> > > > sophisticated work without error although I am very like to,
> > > > Could any expert do this for us?
> > >
> > > This will be difficult for anybody, it's tricky code. I'm not an
> > > expert at it.
> > >
> > > It looks like the interrupt controller disable split has been there
> > > since long before xive. I would try just move them together than see
> > > if that works.
> > Yes, I use "git blame" (I learned "git blame" from Paul E. McKenny ;-)
> > ) to see the same.
> > and anticipate your great works!
>
> I was thinking you could try it and see if it works and what you find.
> If you are interested and have time to look into it?
I am interested! and I have time ;-)
Thank Nick for your trust in me!
I am going to submit my babyish work in about a month (counting the
rcutoture tests time), and thank you in advance for your patience.

Cheers
Zhouyi
>
> Thanks,
> Nick