Re: [PATCH] arm64/fpsimd: Ensure we don't contend a SMCU from idling CPUs

From: Mark Brown
Date: Wed Jul 10 2024 - 19:28:23 EST


On Wed, Jul 10, 2024 at 09:10:53PM +0100, Catalin Marinas wrote:
> On Tue, Jun 18, 2024 at 02:57:42PM +0100, Mark Brown wrote:

> > When we enter the kernel we currently don't update any of the floating
> > point register state until either something else uses floating point or we
> > get a CPU_PM_ENTER notification during suspend or cpuidle. This means that
> > for a system which has been configured with both suspend and cpuidle
> > disabled we will leave whatever floating point state was loaded in the
> > registers present while a CPU is idling.

> I guess this approach is useful when the kernel does a light WFI rather
> than going all the way to firmware for a deep sleep state. In general,
> the shallower the sleep state is, the more CPU state is retained. From a
> power perspective, I wonder whether we should leave the decision to drop
> the SME state to a cpuidle driver.

The concern here is if we don't have a cpuidle driver - we could also
make this conditional on !CPUIDLE.

> Which situations should we consider for such idle scenario (we discussed
> them offline briefly)? I think:

> 1. Thread migration: a thread using SME is moved to a different CPU.
> Here SMSTOP makes sense because a new thread scheduled eventually
> will need a different SME state.

> 2. Thread page fault followed by waiting for I/O etc. and the kernel may
> switch to idle. Here it's probably less beneficial to do an SMSTOP.

> Any other cases? Blocking syscalls don't matter since we don't preserve
> the state between calls.

For syscalls we explicitly disable streaming mode, but we do allow ZA to
be active so you might have a blocking syscall with ZA enabled. Having
state in ZA is less of a concern than streaming mode, it will have a
power impact but it is much less likely that there will be a performance
impact on other cores.

> The trade-off is for case (2) above and it depends on whether it happens
> sufficiently often to be noticeable. I wouldn't think so.

Yes, to me it seems much more likely that we would decide to schedule a
task out while it was using SME rather than getting faults where the
overhead of reloading the state was noticable.

> > + /*
> > + * Leaving SME enabled may leave this core contending with
> > + * other cores if we have a SMCU, disable whenever we enter
> > + * idle to avoid this. Only do this if they're actually
> > + * enabled to avoid overhead in cases where we don't enter a
> > + * low enough power state to loose register state.
> > + */
> > + if (system_supports_sme() &&
> > + (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)))
> > + fpsimd_save_and_flush_cpu_state();
> > +}

> Do we always enter here via the idle thread? If we already had a thread
> switch we probably don't need to save the SME state again, only flush
> the state.

If we've actually switched the thread then TIF_FOREIGN_FPSTATE has been
set and we'll just check the flag and return for the save portion rather
than actually writing any register values out so the overhead should be
minimal. It feels safer to check in case we get better at doing the
save lazily.

Otherwise arch_cpu_idle_enter() is called from psci_checker as well as
the idle thread, this code should not be relevant either way in that
context since it runs before userspace and AIUI it's trying to do the
same thing as the idle thread there anyway.

Attachment: signature.asc
Description: PGP signature