On Wed, 2023-02-01 at 14:40 +0000, Usama Arif wrote:
On 01/02/2022 20:53, David Woodhouse wrote:
Doing the INIT/SIPI/SIPI in parallel for all APs and *then* waiting for
them shaves about 80% off the AP bringup time on a 96-thread 2-socket
Skylake box (EC2 c5.metal) — from about 500ms to 100ms.
There are more wins to be had with further parallelisation, but this is
the simple part.
Hi,
We are interested in reducing the boot time of servers (with kexec), and
smpboot takes up a significant amount of time while booting. When
testing the patch series (rebased to v6.1) on a server with 128 CPUs
split across 2 NUMA nodes, it brought down the smpboot time from ~700ms
to 100ms. Adding another cpuhp state for do_wait_cpu_initialized to make
sure cpu_init is reached (as done in v1 of the series + using the
cpu_finishup_mask) brought it down further to ~30ms.
I just wanted to check what was needed to progress the patch series
further for review? There weren't any comments on v4 of the patch so I
couldn't figure out what more is needed. I think its quite useful to
have this working so would be really glad help in anything needed to
restart the review.
I believe the only thing holding it back was the fact that it broke on
some AMD CPUs.
We don't *think* there are any remaining software issues; we think it's
hardware. Either an actual hardware race in CPU or chipset, or perhaps
even something as simple as a voltage regulator which can't cope with
an increase in power draw from *all* the CPUs at the same time.
We have prodded AMD a few times to investigate, but so far to no avail.
Last time I actually spoke to Thomas in person, I think he agreed that
we should just merge it and disable the parallel mode for the affected
AMD CPUs.
If you've already rebased to a newer kernel and tested it, perhaps now
is the time to do just that.