Re: [PATCH 00/10] GICv3 support for kexec/kdump on EFI systems

From: Xulin Sun
Date: Fri Feb 01 2019 - 22:08:47 EST


On 02/01/2019 05:15 PM, Marc Zyngier wrote:
Hi Xulin,

On 01/02/2019 06:11, Sun Ted wrote:
Hi Marc,

Marc Zyngier <marc.zyngier@xxxxxxx <mailto:marc.zyngier@xxxxxxx>> ä2018
å9æ22æåå äå4:03åéï

The GICv3 architecture has the remarkable feature that once LPI tables
have been assigned to redistributors and that LPI delivery is enabled,
there is no guarantee that LPIs can be turned off (and most
implementations do not allow it), nor can it be reprogrammed to use
other tables.

This is a bit of a problem for kexec, where the secondary kernel
completely looses track of the previous allocations. If the secondary
kernel doesn't allocate the tables exactly the same way, no LPIs will
be delivered by the GIC (which continues to use the old tables), and
memory previously allocated for the pending tables will be slowly
corrupted, one bit at a time.

The workaround for this is based on a series[1] by Ard Biesheuvel,
which adds the required infrastructure for memory reservations to be
passed from one kernel to another using an EFI table.

This infrastructure is then used to register the allocation of GIC
tables with EFI, and allow the GIC driver to safely reuse the existing
programming if it detects that the tables have been correctly
registered. On non-EFI systems, there is not much we can do.


Sorry to turn this question out again.
For others non-EFI systems, just as your said till now we did do
anything, right?
We didn't do anything, because there is nothing we can do.

I did see the kexec booting failure since re-setting
theÂGICR_CTLR.EnableLPI from "1" to "0" unsuccessful.

The below commit adds theÂjudgement for disabling LPIs, and returned
error. Caused "kexec" bootingÂfailure.
And I fully expected this. When I said "we don't do anything", I meant
"we don't do anything to make LPIs work".

6eb486b66a (irqchip/gic-v3: Ensure GICR_CTLR.EnableLPI=0 is observed
before enabling).

<snip patch>
Âint its_cpu_init(void)
Â{
    if (!list_empty(&its_nodes)) {
-Â Â Â Â Â Â Â Âif (!gic_rdists_supports_plpis()) {
-Â Â Â Â Â Â Â Â Â Â Â Âpr_info("CPU%d: LPIs not supported\n",
smp_processor_id());
-Â Â Â Â Â Â Â Â Â Â Â Âreturn -ENXIO;
-Â Â Â Â Â Â Â Â}
+Â Â Â Â Â Â Â Âint ret;
+
+Â Â Â Â Â Â Â Âret = redist_disable_lpis();
+Â Â Â Â Â Â Â Âif (ret)
+Â Â Â Â Â Â Â Â Â Â Â Âreturn ret;
+
And I maintain that this is the right thing to do. If LPIs are on and
the memory has not been reserved, it is then likely that this memory is
now being used by the kernel for something else. The system is thus
going to see single-bit corruption in some random places.

At that stage, your system is horribly unsafe, and I will not let it
boot under these conditions. If it was working before, that's because
you were lucky, and I place no faith in luck.

Now you have two alternatives:

- You switch to an EFI-based firmware. These days, even u-boot has an
EFI implementation. COnsider doing that if you can.

- If there is no EFI implementation for your SoC, you can pass the
"irqchip.gicv3_nolpi=1" option to the first kernel. This will keep LPI
disabled, and you'll be able to kexec a secondary kernel (and get
working LPIs there). This is what I do on my Chromebook.

Hi Marc,

Thanks for detailed explanation.

Thanks
Xulin

Thanks,

M.