On 5 March 2010 18:42, Dimitri Sivanich <sivanich@xxxxxxx> wrote:The assumption that all CPUs are the same is not always true in practice, people buy a system and don't always fully populate initially, and when they add processors, they have a more recent stepping. So reusing microcode or updating in parallel would add complexity, and 2 sec for 1024 CPUs puts a pretty low upper bound on possible improvement. Does more improvement to a one time small delay justify additional complexity?We've noticed that on large SGI UV system configurations, running
microcode.ctl can take very long periods of time. This is due to
the large number of vmalloc/vfree calls made by the Intel
generic_load_microcode() logic.
By reusing allocated space, the following patch reduces the time
to run microcode.ctl on a 1024 cpu system from approximately 80
seconds down to 1 or 2 seconds.
Signed-off-by: Dimitri Sivanich <sivanich@xxxxxxx>
This approach seems reasonable in the scope of the current framework.
Acked-by: Dmitry Adamushko <dmitry.adamushko@xxxxxxxxx>
However, I think a better approach would be to have some kind of
shared storage for loaded microcode updates. Given that for the
majority of SMP systems all the cpus are normally updated to the very
same new instance of microcode, it should be enough to do a search for
the first cpu, cache the instance of microcode and then reuse it for
others.