Re: [PATCH v2] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

From: Thiago Jung Bauermann
Date: Mon Mar 11 2019 - 15:30:08 EST



Hello Gautham,

Thanks for your review.

Gautham R Shenoy <ego@xxxxxxxxxxxxxxxxxx> writes:

> Hello Thiago,
>
> On Fri, Feb 22, 2019 at 07:57:52PM -0300, Thiago Jung Bauermann wrote:
>> I see two cases that can be causing this race:
>>
>> 1. It's possible that CPU 134 was inactive at the time it was unplugged. In
>> that case, dlpar_offline_cpu() calls H_PROD on that CPU and immediately
>> calls pseries_cpu_die(). Meanwhile, the prodded CPU activates and start
>> the process of stopping itself. It's possible that the busy loop is not
>> long enough to allow for the CPU to wake up and complete the stopping
>> process.
>
> The problem is a bit more severe since, after printing "Querying
> DEAD?" for CPU X, this CPU can prod another offline CPU Y on the same
> core which, on waking up, will call rtas_stop_self. Thus we can have two
> concurrent calls to rtas-stop-self, which is prohibited by the PAPR.

Inded, very good point. I added this information to the patch
description.

>> 2. If CPU 134 was online at the time it was unplugged, it would have gone
>> through the new CPU hotplug state machine in kernel/cpu.c that was
>> introduced in v4.6 to get itself stopped. It's possible that the busy
>> loop in pseries_cpu_die() was long enough for the older hotplug code but
>> not for the new hotplug state machine.
>
> I haven't been able to observe the "Querying DEAD?" messages for the
> online CPU which was being offlined and dlpar'ed out.

Ah, thanks for pointing this out. That was a scenario I thought could
happen when I was investigating this issue but I never confirmed whether
it could really happen. I removed it from the patch description.

>> I don't know if this race condition has any ill effects, but we can make
>> the race a lot more even if we only start querying if the CPU is stopped
>> when the stopping CPU is close to call rtas_stop_self().
>>
>> Since pseries_mach_cpu_die() sets the CPU current state to offline almost
>> immediately before calling rtas_stop_self(), we use that as a signal that
>> it is either already stopped or very close to that point, and we can start
>> the busy loop.
>>
>> As suggested by Michael Ellerman, this patch also changes the busy loop to
>> wait for a fixed amount of wall time.
>>
>> Signed-off-by: Thiago Jung Bauermann <bauerman@xxxxxxxxxxxxx>
>> ---
>> arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> I tried to estimate good amounts for the timeout and loop delays, but
>> I'm not sure how reasonable my numbers are. The busy loops will wait for
>> 100 Âs between each try, and spin_event_timeout() will timeout after
>> 100 ms. I'll be happy to change these values if you have better
>> suggestions.
>
> Based on the measurements that I did on a POWER9 system, in successful
> cases of smp_query_cpu_stopped(cpu) returning affirmative, the maximum
> time spent inside the loop was was 10ms.

That's very good to know. I added this information to the patch
description.

I also added you in an Analyzed-by tag, I hope it's fine with you.

>> Gautham was able to test this patch and it solved the race condition.
>>
>> v1 was a cruder patch which just increased the number of loops:
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html
>>
>> v1 also mentioned a kernel crash but Gautham narrowed it down to a bug
>> in RTAS, which is in the process of being fixed.
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..424146cc752e 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,21 @@ static void pseries_cpu_die(unsigned int cpu)
>> msleep(1);
>> }
>> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> + /*
>> + * If the current state is not offline yet, it means that the
>> + * dying CPU (which is in pseries_mach_cpu_die) didn't have a
>> + * chance to call rtas_stop_self yet and therefore it's too
>> + * early to query if the CPU is stopped.
>> + */
>> + spin_event_timeout(get_cpu_current_state(cpu) == CPU_STATE_OFFLINE,
>> + 100000, 100);
>>
>> for (tries = 0; tries < 25; tries++) {
>
> Can we bumped up the tries to 100, so that we wait for 10ms before
> printing the warning message ?

Good idea. I increased the loop to 200 iterations so that it can take up
to 20 ms, just to be sure.

>> cpu_status = smp_query_cpu_stopped(pcpu);
>> if (cpu_status == QCSS_STOPPED ||
>> cpu_status == QCSS_HARDWARE_ERROR)
>> break;
>> - cpu_relax();
>> + udelay(100);
>> }
>> }
>>


--
Thiago Jung Bauermann
IBM Linux Technology Center