Re: [External] Re: [PATCH] x86/purgatory: provide config to disable purgatory

From: Eric W. Biederman
Date: Mon Nov 29 2021 - 11:56:53 EST


Usama Arif <usama.arif@xxxxxxxxxxxxx> writes:

> Hi,
>
> Thanks for your replies. I have submitted a v2 of the patch with a
> much more detailed commit message including reason for the patch and timing values.
>
> The time taken from reboot to running init process was measured
> with both purgatory enabled and disabled over 20 runs and the
> averages are:
> Purgatory disabled:
> - TSC = 3908766161 cycles
> - ktime = 606.8 ms
> Purgatory enabled:
> - TSC = 5005811885 cycles (28.1% worse)
> - ktime = 843.1 ms (38.9% worse)
>
>
> Our reason for this patch is that it helps reduce the downtime of servers when
> the host kernel managing multiple VMs needs to be updated via kexec,
> but it makes reboot with kexec much faster so should be a general improvement in
> boot time if purgatory is not needed and could have other usecases as well.
> I believe only x86, powerpc and s390 have purgatory supported, other platforms
> like arm64 dont have it implemented yet, so with the reboot time improvement seen,
> it would be a good idea to have the option to disable purgatory completely but set default to y.
> We also have the CONFIG_KEXEC_BZIMAGE_VERIFY_SIG which can be enabled to verify the next
> kernel image to be booted and purgatory can be completely skipped if
> not required.

CONFIG_KEXEC_BZIMAGE_VERIFY_SIG is something totally and completely
different. It's job is to verify that the kernel to be booted comes
from a trusted source. The sha256 verification in purgatory's job
is to verify that memory the kernel cares about was not corrupted
during the kexec process.

I believe when you say purgatory you are really talking about that
sha256 checksum. It really is not possible to disable all of
the code that runs between kernels, as the old and the new kernel may
run at the same addresses. Anything that runs between the two kernels
is what is referred to as purgatory. Even if it is just a small
assembly stub.

That sha256 verification is always needed for kexec on panic, there are
by the nature of a kernel panic too many unknowns to have any confidence
the new kernel will not be corrupted in the process of kexec before it
gets started.

For an ordinary kexec it might be possible to say that you have a
reliable kernel shutdown process and you know for a fact that something
won't come along and corrupt the kernel. I find that a questionable
assertion. I haven't seen anyone yet whose focus when getting an
ordinary kexec to work as anything other than making certain all of the
drivers are shutdown properly.

I have seen countless times when a network packet comes in a the wrong
time and the target kernel's memory is corrupted before it gets far
enough to initialize the network driver.

For a 0.2s speed up you are talking about disabling all of the safety
checks in a very dangerous situation. How much can you can in
performance by optimizing the sha256 implementation instead of using
what is essentially a reference implementation in basic C that I copied
from somewhere long ago.

Optimize the sha256 implementation and the memory copy loop and then
show how the tiny bit of time that is left is on a mission critical path
and must be removed. Then we can reasonably talk about a config option
for disabling the sha256 implementation in the kexec in not-panic case.

That sha256 implementation in part so that we can all sleep at night
because we don't have to deal with very very strange heizenbugs.

Eric