Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow

From: Prarit Bhargava
Date: Tue Mar 25 2014 - 07:36:34 EST

Next message: Andrew Murray: "Re: [RESEND: RFC PATCH 3/3] pcie: keystone: add pcie driver based on designware core driver"
Previous message: Tomasz Figa: "Re: [PATCH] clocksource: exynos_mct: Fix stall after CPU hotplugging"
Next in thread: Igor Mammedov: "Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 03/19/2014 08:54 AM, Igor Mammedov wrote:
> On Wed, 19 Mar 2014 07:51:05 -0400
> Prarit Bhargava <prarit@xxxxxxxxxx> wrote:
>
>>
>>
>> On 03/18/2014 02:49 PM, Igor Mammedov wrote:
>>> On Tue, 18 Mar 2014 08:21:19 -0400
>>> Prarit Bhargava <prarit@xxxxxxxxxx> wrote:
>>>
>>>>
>>>>
>>>> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
>>>>> Hang is observed on virtual machines during CPU hotplug,
>>>>> especially in big guests with many CPUs. (It happens more
>>>>> often if host is over-committed).
>>>>>
>>>>
>>>> Hey Igor, I like this better than the previous version. Thanks for taking into
>>>> account the possible races in this code.
>>>>
>>>> A quick question on system behaviour. As you know I've been more concerned
>>>> lately with error handling, etc., through the cpu hotplug code as we've seen
>>>> several customer reports of silent failures or cascading failures in the cpu
>>>> hotplug code when users have been attempting to perform physical hotplug.
>>>>
>>>> After your patches have been applied, in theory the following can happen:
>>>>
>>>> The master CPU is completing the AP cpu's bring up. The AP cpu is doing (sorry
>>>> for the cut-and-paste),
>>>>
>>>> void cpu_init(void)
>>>> {
>>>> int cpu = smp_processor_id();
>>>> struct task_struct *curr = current;
>>>> struct tss_struct *t = &per_cpu(init_tss, cpu);
>>>> struct thread_struct *thread = &curr->thread;
>>>>
>>>> /*
>>>> * wait till the master CPU completes it's STARTUP sequence,
>>>> * and decides to wait till this AP boots
>>>> */
>>>> while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
>>>> cpu_relax();
>>>> if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
>>>> halt();
>>>> }
>>>>
>>>> and is spinning on cpu_relax(). Suppose something goes wrong and the softlockup
>>>> watchdog fires on the AP cpu:
>>>>
>>>> 1. Can it? :) ie) will the softlockup fire at this point of the AP init? Okay,
>>>> I'm being really lazy and not looking at the code ;)
>>> It shouldn't, CPU is in pristine state and just came from boot trampoline at
>>> this point without interrupts configured yet.
>>
>> Okay, not a big problem.
>>
>>>
>>>>
>>>> 2. Is there anything we can do in this code to notify the user of a problem?
>>>> Even a pr_crit() here I think would help to indicate what went wrong; it might
>>>> be useful for future debugging in this area to have some sort of output. I
>>>> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
>>> Do you mean something like this:
>>>
>>> + if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
>>> + WARN(1);
>>> + halt();
>>> + }
>>
>> Yeah, maybe WARN_ON(1, "some comment") though.
> printk at so early stage might be cause issues, since it is quite complex.
> Its' disabling/enabling irqs, calls *_delay_*() functions and takes locks.
> The last is especially dangerous because if AP is shot down by another
> INIT/SIPI, system will hang on next printk if locks were acquired by AP
> at that time.

early_printk()?

> That case is possible if master CPU has got errors during wakeup_ap() and
> failed cpu_up() then it was unplugged + plugged via ACPI and attempted
> to be onlined again.
>
> It's much safer not to do anything complex at AP start-up so early.
>
> BTW:
> when AP reaches halt() line, failure is not silent. the master CPU might
> print error message if debug level logging is active:
> see arch/x86/kernel/smpboot.c:native_cpu_up()
> ...
> if (err) {
> pr_debug("do_boot_cpu failed %d\n", err);
> return -EIO;
> }
> ...
>
> perhaps we should change pr_debug to pr_crit here to make it more visible.
> something like:
>
> @@ -858,7 +858,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
>
> err = do_boot_cpu(apicid, cpu, tidle);
> if (err) {
> - pr_debug("do_boot_cpu failed %d\n", err);
> + pr_crit("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
> return -EIO;
> }
>

Yes, this is a good idea.

>
>>
>>>
>>>>
>>>> 3. Change this comment:
>>>>
>>>> * wait till the master CPU completes it's STARTUP sequence,
>>>> * and decides to wait till this AP boots
>>>>
>>>> to
>>>>
>>>> /* wait for the master CPU to complete this cpu's STARTUP. */ ?
>>> well, that is not quite the same as above, comment should underline that
>>> AP waits for ACK from master CPU before continuing with this AP initialization.
>>>
>>> How about:
>>> /* wait for ACK from master CPU before continuing with AP initialization */
>>
>> Awesome :)
>>
>> P.
>>
>>>
>>>>
>>>> Apologies for the late review,
>>>>
>>>> P.
>>>
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Murray: "Re: [RESEND: RFC PATCH 3/3] pcie: keystone: add pcie driver based on designware core driver"
Previous message: Tomasz Figa: "Re: [PATCH] clocksource: exynos_mct: Fix stall after CPU hotplugging"
Next in thread: Igor Mammedov: "Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]