RE: [PATCH 1/4] irqchip/sifive-plic: Setup cpuhp once after current handler is present

From: Anup Patel
Date: Sun May 17 2020 - 04:03:04 EST




> -----Original Message-----
> From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Anup Patel
> Sent: 16 May 2020 21:59
> To: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Palmer Dabbelt <palmer@xxxxxxxxxxx>; Paul Walmsley
> <paul.walmsley@xxxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Jason
> Cooper <jason@xxxxxxxxxxxxxx>; Atish Patra <Atish.Patra@xxxxxxx>; Alistair
> Francis <Alistair.Francis@xxxxxxx>; Anup Patel <anup@xxxxxxxxxxxxxx>; linux-
> riscv@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH 1/4] irqchip/sifive-plic: Setup cpuhp once after current
> handler is present
>
>
>
> > -----Original Message-----
> > From: Marc Zyngier <maz@xxxxxxxxxx>
> > Sent: 16 May 2020 19:01
> > To: Anup Patel <Anup.Patel@xxxxxxx>
> > Cc: Palmer Dabbelt <palmer@xxxxxxxxxxx>; Paul Walmsley
> > <paul.walmsley@xxxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>;
> > Jason Cooper <jason@xxxxxxxxxxxxxx>; Atish Patra
> > <Atish.Patra@xxxxxxx>; Alistair Francis <Alistair.Francis@xxxxxxx>;
> > Anup Patel <anup@xxxxxxxxxxxxxx>; linux- riscv@xxxxxxxxxxxxxxxxxxx;
> > linux-kernel@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH 1/4] irqchip/sifive-plic: Setup cpuhp once after
> > current handler is present
> >
> > On 2020-05-16 13:52, Anup Patel wrote:
> > >> -----Original Message-----
> > >> From: Marc Zyngier <maz@xxxxxxxxxx>
> > >> Sent: 16 May 2020 17:42
> > >> To: Anup Patel <Anup.Patel@xxxxxxx>
> > >> Cc: Palmer Dabbelt <palmer@xxxxxxxxxxx>; Paul Walmsley
> > >> <paul.walmsley@xxxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>;
> > >> Jason Cooper <jason@xxxxxxxxxxxxxx>; Atish Patra
> > >> <Atish.Patra@xxxxxxx>; Alistair Francis <Alistair.Francis@xxxxxxx>;
> > >> Anup Patel <anup@xxxxxxxxxxxxxx>;
> > >> linux-
> > >> riscv@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > >> Subject: Re: [PATCH 1/4] irqchip/sifive-plic: Setup cpuhp once
> > >> after current handler is present
> > >>
> > >> Hi Anup,
> > >>
> > >> On 2020-05-16 07:38, Anup Patel wrote:
> > >> > For multiple PLIC instances, the plic_init() is called once for
> > >> > each PLIC instance. Due to this we have two issues:
> > >> > 1. cpuhp_setup_state() is called multiple times 2.
> > >> > plic_starting_cpu() can crash for boot CPU if cpuhp_setup_state()
> > >> > is called before boot CPU PLIC handler is available.
> > >> >
> > >> > This patch fixes both above issues.
> > >> >
> > >> > Signed-off-by: Anup Patel <anup.patel@xxxxxxx>
> > >> > ---
> > >> > drivers/irqchip/irq-sifive-plic.c | 14 ++++++++++++--
> > >> > 1 file changed, 12 insertions(+), 2 deletions(-)
> > >> >
> > >> > diff --git a/drivers/irqchip/irq-sifive-plic.c
> > >> > b/drivers/irqchip/irq-sifive-plic.c
> > >> > index 822e074c0600..7dc23edb3267 100644
> > >> > --- a/drivers/irqchip/irq-sifive-plic.c
> > >> > +++ b/drivers/irqchip/irq-sifive-plic.c
> > >> > @@ -76,6 +76,7 @@ struct plic_handler {
> > >> > void __iomem *enable_base;
> > >> > struct plic_priv *priv;
> > >> > };
> > >> > +static bool plic_cpuhp_setup_done;
> > >> > static DEFINE_PER_CPU(struct plic_handler, plic_handlers);
> > >> >
> > >> > static inline void plic_toggle(struct plic_handler *handler, @@
> > >> > -282,6 +283,7 @@ static int __init plic_init(struct device_node *node,
> > >> > int error = 0, nr_contexts, nr_handlers = 0, i;
> > >> > u32 nr_irqs;
> > >> > struct plic_priv *priv;
> > >> > + struct plic_handler *handler;
> > >> >
> > >> > priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> > >> > if (!priv)
> > >> > @@ -310,7 +312,6 @@ static int __init plic_init(struct
> > >> > device_node *node,
> > >> >
> > >> > for (i = 0; i < nr_contexts; i++) {
> > >> > struct of_phandle_args parent;
> > >> > - struct plic_handler *handler;
> > >> > irq_hw_number_t hwirq;
> > >> > int cpu, hartid;
> > >> >
> > >> > @@ -364,9 +365,18 @@ static int __init plic_init(struct
> > >> > device_node *node,
> > >> > nr_handlers++;
> > >> > }
> > >> >
> > >> > - cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
> > >> > + /*
> > >> > + * We can have multiple PLIC instances so setup cpuhp state
> only
> > >> > + * when context handler for current/boot CPU is present.
> > >> > + */
> > >> > + handler = this_cpu_ptr(&plic_handlers);
> > >> > + if (handler->present && !plic_cpuhp_setup_done) {
> > >>
> > >> If there is no context handler for the boot CPU, the system is
> > >> doomed, right? It isn't able to get any interrupt, and you don't
> > >> register the hotplug notifier that could allow secondary CPUs to
> > >> boot.
> > >>
> > >> So what is the point? It feels like you should just give up here.
> > >>
> > >> Also, the boot CPU is always CPU 0. So checking that you only
> > >> register the hotplug notifier from CPU 0 should be enough.
> > >
> > > The boot CPU is not fixed in RISC-V, the logical id of the boot CPU
> > > will always be zero but physical id (or HART id) can be something
> > > totally different.
> >
> > So on riscv, smp_processor_id() can return a non-zero value on the the
> > boot CPU? Interesting... :-/
>
> On RISC-V, smp_processor_id() returns logical id (which is the order in Which
> CPUs are brought up).
>
> We have special function cpuid_to_hartid_map() in asm/smp.h which helps us
> convert logical id to HART id.
>
> The HART id in RISC-V world is like to MPIDR of ARM world.
>
> >
> > >
> > > On RISC-V NUMA system, we will have a separate PLIC in each NUMA node.
> > >
> > > Let's say we have a system with 2 NUMA nodes, each NUMA node having
> > > 4 CPUs (or 4 HARTs). In this case, the DTB passed to Linux will
> > > have two PLIC DT nodes where each PLIC device targets only 4 CPUs
> > > (or 4 HARTs). The
> > > plic_init() functions will setup handlers for only 4 CPUs (or 4 HARTs).
> > > In other
> > > words, plic_init() for "PLIC0" will setup handler for HART id 0 to 3
> > > and
> > > plic_init() for "PLIC1" will setup handler for HART id 4 to 7. Now,
> > > any CPU can be boot CPU so it is possible that CPU with HART id 4 is
> > > boot CPU and when plic_init() is first called for "PLIC0" the
> > > handler for HART id 4 is not setup because it will be setup later
> > > when
> > > plic_init() is called for "PLIC1".
> > > This cause plic_starting_cpu() to crash when plic_init() is called
> > > for "PLIC0".
> > >
> > > I hope above example helps understanding the issue.
> >
> > It does, thanks. This pseudo NUMA thing really is a terrible hack...
> >
> > >
> > > I encounter this issue randomly when booting Linux on QEMU RISC-V
> > > with multiple NUMA nodes.
> >
> > Then why don't you defer the probing of the PLIC you can't initialize
> > from this CPU? If you're on CPU4-7, only initialize the PLIC that
> > matters to you, and not the the others. It would certainly make a lot more
> sense, and be more robust.
>
> Okay, I will try -EPROBE defer approach for v2.

I tried -EPROBE_DEFER for PLIC instances not targeting boot CPU. It
works fine for boot CPU but now it fails for secondary CPUs because
PLIC probe is deferred after secondary CPU bringup.

This can be further explained by previous example.

Let's say we have a system with 2 NUMA nodes, each NUMA node
having 4 CPUs (or 4 HARTs). In this case, the DTB passed to Linux will
have two PLIC DT nodes where each PLIC device targets only 4 CPUs (or
4 HARTs). The plic_init() for "PLIC0" will setup handler for HART id 0 to 3
and plic_init() for "PLIC1" will setup handler for HART id 4 to 7. Now, any
CPU can be boot CPU so let's assume CPU with HART id 5 is the boot CPU.
The plic_init() is first called for "PLIC0" which does not target boot CPU
(HART id 5) so we defer "PLIC0" probe. Next, plic_init() is called for "PLIC1"
which succeeds and we get handler for CPUs with HART id 4 to 7 (including
boot CPU). Later, Linux brings-up secondary CPUs starting with HART id 0
which fails because plic_starting_cpu() crashes for HART id 0 as the "PLIC0"
probe is deferred (i.e. not yet done).

The real reason why -EPROBE_DEFER did not work in this case is that
Linux irq subsystem defers probe after secondary CPU bringup.

I will keep the current approach for v2 series unless you have other
suggestion.

Regards,
Anup