Re: [PATCH v2 4/4] arm64: dts: qcom: sc7280: Add cpu and llcc BWMON (=> interconnect issue)

From: Matthias Kaehlcke
Date: Fri Jan 20 2023 - 16:32:53 EST


On Tue, Jan 17, 2023 at 05:47:14PM +0000, Matthias Kaehlcke wrote:
> On Tue, Jan 17, 2023 at 06:33:41PM +0100, Krzysztof Kozlowski wrote:
> > On 17/01/2023 18:27, Matthias Kaehlcke wrote:
> > >
> > >>> which would set the initially bandwidths to 0 and determine the actually
> > >>> needed bandwidth. But since the driver isn't probed the initial
> > >>> bandwidths stay at INT_MAX.
> > >>>
> > >>> This isn't actually an issue with this patch, but how the interconnect
> > >>> framework deals with devices that are registered on the bus, but aren't
> > >>> probed (yet). Not sure how this would be best fixed. Georgi, do you have
> > >>> any ideas?
> > >>
> > >> Why the device is not probed (yet)? If it is registered, it will come
> > >> soon during boot up.
> > >
> > > Because CONFIG_QCOM_ICC_BWMON is not enabled for the board in question (see
> > > above). It could be enabled as a short term mitigtion, however we shouldn't
> > > require drivers to be enabled just because the DT has a corresponding node.
> >
> > It's the same case as with all other interconnect leafs/consumers. The
> > same behavior if you do not have it enabled, right? If not, I wonder
> > what is here different?
>
> Right, this is a general issue. The problem on sc7280 (and probably other
> Qualcomm SoCs) is that the interconnect link at full throttle prevents the
> SoC from entering its low power mode (AOSS sleep) during system suspend.
> On many boards this might go unnoticed, on herobrine the condition is
> detected by the embedded controller (EC) and considered a failed suspend,
> which results in waking up the system.

I did some hackery to convince the EC to enter/stay in S3, despite AOSS
no entering sleep mode. That allowed me to take power measurements. With
the kernel suspended but the AOSS remaining on the power consumption of
the Qcard is more than 7x higher than when the AOSS enters sleep mode!
On a Qcard system I can't break the power consumption further down, but
I think the extra power consumption must be coming mostly from the SoC
itself, since the kernel and the EC are essentially in the same state as
during a suspend with AOSS in sleep mode.

The enormous increase in power consumption suggests that this is a serious
issue for non-Chrome OS platforms as well. On herobrine and trogdor boards
we have the 'luxury' of being able to detect that AOSS stays awake (though
it comes with the caveat that the system can't suspend when that happens).
On other boards this goes likely unnoticed until someone measures suspend
power or notices a significant regression in S3 battery life.

It seems something needs to be done at the interconnect core to fix this.
Is it really necessary to init all link to max bandwidth? Maybe it is
needed for certain links, but not all of them? What is the background
here?