Re: [PATCH RFC 1/1] arm64: Use PSCI calls for CPU stop when hotplug is supported

From: Mark Rutland
Date: Wed Jan 23 2019 - 12:33:51 EST


On Wed, Jan 23, 2019 at 09:05:26AM -0800, Scott Branden wrote:
> Hi Mark,
>
> Hopefully I can shed some light on the use case inline.
>
> On 2019-01-23 8:48 a.m., Mark Rutland wrote:
> > On Mon, Jan 21, 2019 at 11:30:02AM +0530, Pramod Kumar wrote:
> > > On Mon, Jan 21, 2019 at 11:28 AM Pramod Kumar <pramod.kumar@xxxxxxxxxxxx>
> > > wrote:
> > >
> > > Need comes from a specific use case where one Accelerator card(SoC) is
> > > plugged in a sever over a PCIe interface. This Card gets supply from a
> > > battery, which could provide very less power for a very small time, in case
> > > of any power loss. Once Card switches to battery, this has to reduce its
> > > power consumption to its lowest point and back-up the DDR contents asap
> > > before battery gets fully drained off.
> > In this example is Linux running on the server, or on the accelerator?
> Accelerator
> >
> > What precisely are you trying to back up from DDR, and why?
> Data in DDR is being written to disk at this time (disk is connected to
> accelerator)
> >
> > What is responsible for backing up that contents?
>
> A low power M-class processor and DMA engine which continues necessary
> operations to transfer DDR memory to disk.
>
> The high power processors on the accelerator running linux needed to be
> halted ASAP on this power loss event and M0 take over. Graceful shutdown of
> linux and other peripherals is unnecessary (and we don't have the power
> necessary to do so).

If graceful shutdown of Linux is not required (and is in fact
undesireable), why is Linux involved at all in this shutdown process?

For example, why is this not a secure interrupt taken to EL3, which can
(gracefully) shut down the CPUs regardless?

> > > Since battery can provide limited power for a very short time hence need to
> > > transition to lowest power. As per the transition process , CPUs power
> > > domain has to be off but before that it needs to flush out its content to
> > > system memory(L3) so that content could be backed-up by a MCU, a controller
> > > consuming very less power. Since we can not afford plugging-out every
> > > individual CPUs in sequence hence uses ipi_cpu_stop for all other CPUs
> > > which ultimately switch to ATF to flush out all the CPUs caches and comes
> > > out of coherency domain so that its power rails could be switched-off.
> > If you're stopping CPUs from completely arbitrary states, what is the
> > benefit of saving the RAM contents?
>
> Some of the RAM contains data that was in the process of being written to
> disk by the accelerator.

Ok, so this isn't actually about backing up RAM contents; it's about
completing pending I/O.

I'm still confused as to how that works. How do you avoid leaving the
disk in some corrupt state if data runs out partway through?

> This data must be saved to disk and the high power CPUs consume too much
> power to continue performing this operation.
>
> > CPUs might be running with IRQs disabled for an arbitrarily long time,
>
> In an embedded linux system we control everything running.

Sure, and that complete control allows you to do something better than
this RFC, AFAICT.

Thanks,
Mark.