RE: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again

From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Date: Sun Nov 21 2021 - 19:26:10 EST




> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:bigeasy@xxxxxxxxxxxxx]
> Sent: Saturday, November 20, 2021 1:37 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@xxxxxxxxxx>
> Cc: peterz@xxxxxxxxxxxxx; valentin.schneider@xxxxxxx; mingo@xxxxxxxxxx;
> tglx@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Gonglei (Arei)
> <arei.gonglei@xxxxxxxxxx>
> Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> again
>
> Sorry for forgetting…
>
> On 2021-10-08 03:10:34 [+0000], Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) wrote:
> > > -----Original Message-----
> > > From: Sebastian Andrzej Siewior [mailto:bigeasy@xxxxxxxxxxxxx]
> > > Sent: Thursday, September 30, 2021 10:01 PM
> > > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > <longpeng2@xxxxxxxxxx>
> > > Cc: peterz@xxxxxxxxxxxxx; valentin.schneider@xxxxxxx; mingo@xxxxxxxxxx;
> > > tglx@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Gonglei (Arei)
> > > <arei.gonglei@xxxxxxxxxx>
> > > Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> > > again
> > >
> > > On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> > > > The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> > > > the cpu is waken up, but it won't be reset when the failure occurs.
> > > > Then the user cannot to make the cpu online anymore, because the
> > > > CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> > > >
> > > > We should allow the user to try again in this case.
> > >
> > > Can you please describe where it failed / what did you reach that state?
> > >
> >
> > native_cpu_up
> > cpu_check_up_prepare
> > do_boot_cpu
> > /* Wait 10s total for first sign of life from AP */
> >
> > It will fail if the AP doesn't response in 10s and then cpu_hotplug_state
> > will stay in CPU_UP_PREPARE state.
> >
> > This could happen on a virtualized system, especially in some special usages,
> > e.g. Software Enclaves [1][2]
>
> So wakeup_cpu_via_init_nmi() / wakeup_secondary_cpu() succeeds but the
> CPU does not show up with 10 seconds.
> Does the CPU come in later and spins in wait_for_master_cpu() or is the
> CPU completely missing?
>

The cpu is completely missing at the moment since the hypervisor can reject
all events that send to this cpu when the enclave vm is running.

But the cpu can receive the events and bring up again if the enclave vm is
terminated.


> > [1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> > [2]
> https://www.alibabacloud.com/help/doc-detail/203433.htm?spm=a3c0i.23986742.
> 6981761520.1.7e30715eZCRXmk
> >
> >
> > > > Signed-off-by: Longpeng(Mike) <longpeng2@xxxxxxxxxx>
> > > > ---
> > > > kernel/smpboot.c | 7 +++++++
> > > > 1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > > > index f6bc0bc..d18f8ff 100644
> > > > --- a/kernel/smpboot.c
> > > > +++ b/kernel/smpboot.c
> > > > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> > > > */
> > > > return -EAGAIN;
> > > >
> > > > + case CPU_UP_PREPARE:
> > > > + /*
> > > > + * The CPU failed to bringup last time, allow the user
> > > > + * continue to try to start it up.
> > > > + */
> > > > + return 0;
> > > > +
> > > > default:
> > > >
> > > > /* Should not happen. Famous last words. */
>
> Sebastian