Re: [PATCH] x86: skip delays during SMP initialization similar to Xen

From: Len Brown
Date: Sat May 16 2015 - 05:08:10 EST


On Thu, May 14, 2015 at 1:57 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * "Jan H. SchÃnherr" <jschoenh@xxxxxxxxx> wrote:
>
>> Ingo, do you want an updated version of the original patch, which
>> takes care not get stuck, when the INIT deassertion is skipped, or
>> do you prefer to address delays "one by one" as you wrote elsewhere?
>
> So I'm not against improving this code at all, but instead of this
> hard to follow mixing of old and new code, I'd find the following
> approach cleaner and more acceptable: create a 'modern' and a 'legacy'
> SMP-bootup variant function, and do a clean separation based on the
> CPU model cutoff condition used by Len's patches:
>
> /* if modern processor, use no delay */
> if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && (boot_cpu_data.x86 == 6)) ||
> ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0xF)))
> init_udelay = 0;
>
> Then in the modern variant we can become even more aggressive and
> remove these kinds of delays as well:

Not sure it is worth two versions, since this is not where the big
time is spent.
See below.

>
> udelay(300);

FWIW, MPS 1.4 suggests this should be 200, not 300.

> udelay(200);
>
> plus I'd suggest making these poll loops in smpboot.c loops narrower:
>
> udelay(100);

FWIW, on my dekstop, this one executed 17 times (1700usec)
This is the time for the remote CPU to wake and get to cpu_init().
Why is it a benefit to have any udelay() before invoking schedule()?

> udelay(100);

This one didn't execute at all. Indeed, I don't understand why it exists,
per question above.

/*
* Wait till AP completes initial initialization
*/
while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
/*
* Allow other tasks to run while we wait for the
* AP to come online. This also gives a chance
* for the MTRR work(triggered by the AP coming online)
* to be completed in the stop machine context.
*/
udelay(100);
schedule();
}

So, the latest TIP has the INIT udelay(10,000) removed,
but cpu_up() still takes nearly 19,000 usec on a HSW dekstop.

A quick scan of the ftrace shows some high runners:

18949.45 us cpu_up()
2450.580 us notifier_call_chain
102.751 us thermal_throttle_cpu_callback
289.313 us dpm_sysfs_add
1019.594 us msr_class_cpu_callback
...
8455.462 us native_cpu_up()
500.000 us = udelay(300) + udelay(200) Startup IPI
500.000 us = udelay(300) + udelay(200) Startup IPI
1700.000 us = 17 x udelay(100) waiting for AP in initialized_map
2004.172 us check_tsc_warp()

7977.799 us cpu_notify()
1588.108 us cpuset_cpu_active
3043.955 us cacheinfo_cpu_callback
1146.234 us mce_cpu_callback
541.105 us cpufreq_cpu_callback
213.685 us coretemp_cpu_callback


cacheinfo_cpu_callback() time appears to be spent creating a bunch
of sysfs nodes, which is apparetly an expensive operation.

check_tsc_warp() is hard-coded to take 2ms.
I don't know if 2ms is a magic number or if shorter has same value.
It seems a bit sad to do this serially for every CPU at boot,
when we could do all the CPUs in parallel after they are on-line.
Perhaps this should be invoked only for boot-time and hot-add time.
It shouldn't be needed at all for soft online and resume.

Startup IPI delays.
MPS 1.4 actually says 200+200, not 300+200, as Linux reads.
I don't know where the 300 came from, maybe it was a typo?

msr_class_cpu_callback -- making device nodes is not fast.

I don't know if anything can be done for the 1700us wait
for the remote processor to mark itself initialized.
That is the 1st thing it does when it enters cpu_init().

On the xeon, I had see x86_init_rdrand() take 781usec --
dunno why that isn't seen on this box. I'll look at that box again next week.

cheers,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/