Hi Henry,If it is unsafe to call cpu_ops.cpu_die (or cpu_die) on Arm except from cpuhp shouldn't something detect that?
On 06/11/2020 23:25, Henry Willard wrote:
machine_shutdown() is called by kernel_kexec() to shutdownAll this variation is because we want to to get the CPU back in a sane state, as if we'd
the non-boot CPUs prior to starting the new kernel. The
implementation of machine_shutdown() varies by architecture.
Many make an interprocessor call, such as smp_send_stop(),
to stop the non-boot CPUs. On some architectures the CPUs make
some sort of firmware call to stop the CPU. On some architectures
without the necessary firmware support to stop the CPU, the CPUs
go into a disabled loop, which is not suitable for supporting
kexec. On Arm64 systems that support PSCI, CPUs can be stopped
with a PSCI CPU_OFF call.
just come from cold boot. Without the platform firmware doing its initialisation, the only
way we have of doing this is to run the cpuhp callbacks to take the CPU offline cleanly.
Executing user code
Arm64 machine_shutdown() uses the CPU hotplug infrastructure viaHmmm, looks like cpuhp doesn't have a way to run the callbacks in parallel...
smp_shutdown_nonboot_cpus() to stop each CPU. This is relatively
slow and takes a best case of .02 to .03 seconds per CPU which are
stopped sequentially.
This can take the better part of a second forBusy doing what?
all the CPUs to be stopped depending on how many CPUs are present.
If for some reason the CPUs are busy at the time of the kexec reboot,
it can take several seconds to shut them all down.
It might help the worst cases, but even on an idle system it takes a while.
I assume the problem is CPUs starting work on behalf of user-space, which is now
pointless, which prevents them from scheduling into the cpuhp work quickly.
Does hoisting kexec's conditional call to freeze_processes() above the #ifdef - so that
user-space threads are no longer schedule-able improve things here?
The requirement comes from the owner of the larger environment of which this embedded system is a part. So, yes, this is part of software maintenance of a component during regular operation.
Each CPU shuts itself down by calling PSCI CPU_OFF.Where does this requirement come from? Surely kexec is part of a software update, not
In some applications such as embedded systems, which need a very
fast reboot (less than a second), this may be too slow.
regular operation.
I believe this is done by irq_migrate_all_off_this_cpu(), which is called by take_cpu_down() scheduled on the processor to be shutdown by stop_machine_cpuslocked(). I missed that. While take_cpu_down() is running on the target processor the boot CPU is waiting, which is part of the reason for the latency.This patch reverts to using smp_send_stop() to signal allThis is appropriate for panic(), as we accept it may fail.
CPUs to stop immediately. Currently smp_send_stop() causes each cpu
to call local_cpu_stop(), which goes into a disabled loop. This patch
modifies local_cpu_stop() to call cpu_die() when kexec_in_progress
is true, so that the CPU calls PSCI CPU_OFF just as in the case of
smp_shutdown_nonboot_cpus().
For Kexec(), the CPU must go offline, otherwise we can't overwrite the code it was
running. The arch code can't just call CPU_OFF in any context. See 5.5 CPU_OFF' of
https://developer.arm.com/documentation/den0022/d
5.5.2 describes what the OS must do first, in particular interrupts must be migrated away
from the CPU calling CPU_OFF. Currently the cpuhp notifiers do this, which after this
patch, no longer run.
I think it is relatively slow because the CPUs are shutdown sequentially. Besides ia64, most of the architectures that support kexec, appear to kill them all at once. I will try again. Thanks for the pointers.
You're going to need some duct-tape here, but I recall the proposed
'ARCH_OFFLINE_CPUS_ON_REBOOT', which would help, but isn't a complete thing. From the
discussion:
https://lore.kernel.org/lkml/87h80vwta7.fsf@xxxxxxxxxxxxxxxxxxxxxxx/
https://lore.kernel.org/lkml/alpine.DEB.2.21.1908201321200.2223@xxxxxxxxxxxxxxxxxxxxxxx/
using cpuhp to offline these CPUs is the right thing to do.
If the problem is its too slow, can we tackled that instead?
If all the CPUs haven't exited (num_online_cpus() > 1), machine_kexec() will panic if it isn't a crash kernel. Preferably we want to find out sooner rather than later if it isn't going to finish.
Using smp_send_stop() instead ofFor kexec we must wait for the CPU to exit the current kernel. If it doesn't we can't
smp_shutdown_nonboot_cpus() reduces the shutdown time for 23 CPUs
from about .65 seconds on an idle system to less than 5 msecs. On a
busy system smp_shutdown_nonboot_cpus() may take several seconds,
while smp_send_stop() needs only the 5 msecs.
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4784011cecac..2568452a2417 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -142,12 +143,22 @@ void arch_cpu_idle_dead(void)
* This must completely disable all secondary CPUs; simply causing those CPUs
* to execute e.g. a RAM-based pin loop is not sufficient. This allows the
* kexec'd kernel to use any and all RAM as it sees fit, without having to
- * avoid any code or data used by any SW CPU pin loop. The CPU hotplug
- * functionality embodied in smpt_shutdown_nonboot_cpus() to achieve this.
+ * avoid any code or data used by any SW CPU pin loop. The target stop function
+ * will call cpu_die() if kexec_in_progress is set.
*/
void machine_shutdown(void)
{
- smp_shutdown_nonboot_cpus(reboot_cpu);
+ unsigned long timeout;
+
+ /*
+ * Don't wait forever, but no longer than a second
+ */
overwrite the current memory image with the kexec payload.
Thanks,
Features like CNP allow CPUs to share TLB entries. If a CPU is left behind in the older
kernel, the code its is executing will be overwritten and its behaviour stops being
predictable. It may start allocating junk TLB entries, that CNP allows CPUs in the new
kernel to use, resulting in hard to debug crashes.
For kdump we avoid this problem by ensuring the old and new kernels never overlap. The old
kernel doesn't even have the kdump carveout mapped.
+ timeout = USEC_PER_SEC;
+
+ smp_send_stop();
+ while (num_online_cpus() > 1 && timeout--)
+ udelay(1);
+ return;
}
/*
Thanks,
James