Re: [PATCH v8 15/19] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system

From: Will Deacon
Date: Thu Jun 03 2021 - 13:41:06 EST


On Thu, Jun 03, 2021 at 01:58:56PM +0100, Mark Rutland wrote:
> On Wed, Jun 02, 2021 at 05:47:15PM +0100, Will Deacon wrote:
> > If we want to support 32-bit applications, then when we identify a CPU
> > with mismatched 32-bit EL0 support we must ensure that we will always
> > have an active 32-bit CPU available to us from then on. This is important
> > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> > CPUs for compat tasks and forced migration due to a hotplug event will
> > hang if no 32-bit CPUs are available.
> >
> > On detecting a mismatch, prevent offlining of either the mismatching CPU
> > if it is 32-bit capable, or find the first active 32-bit capable CPU
> > otherwise.
> >
> > Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx>
> > Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> > ---
> > arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++-
> > 1 file changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 4194a47de62d..b31d7a1eaed6 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2877,15 +2877,33 @@ void __init setup_cpu_features(void)
> >
> > static int enable_mismatched_32bit_el0(unsigned int cpu)
> > {
> > + static int lucky_winner = -1;
>
> This is cute, but could we please give it a meaningful name, e.g.
> `pinned_cpu` ?

I really don't see the problem, nor why it's "cute".

Tell you what, I'll add a comment instead:

/*
* The first 32-bit-capable CPU we detected and so can no longer
* be offlined by userspace. -1 indicates we haven't yet onlined
* a 32-bit-capable CPU.
*/

> > struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> > bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> >
> > if (cpu_32bit) {
> > cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
> > static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > - setup_elf_hwcaps(compat_elf_hwcaps);
> > }
> >
> > + if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > + return 0;
> > +
> > + if (lucky_winner >= 0)
> > + return 0;
> > +
> > + /*
> > + * We've detected a mismatch. We need to keep one of our CPUs with
> > + * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> > + * every CPU in the system for a 32-bit task.
> > + */
> > + lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> > + cpu_active_mask);
> > + get_cpu_device(lucky_winner)->offline_disabled = true;
> > + setup_elf_hwcaps(compat_elf_hwcaps);
> > + pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
> > + cpu, lucky_winner);
> > return 0;
> > }
>
> I guess this is going to play havoc with kexec and hibernate. :/

The kernel can still offline the CPUs (see the whole freezer mess that I
linked to in the cover letter). What specific havoc are you thinking of?

Will