Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes

From: Cristian Marussi
Date: Fri Oct 25 2024 - 04:29:13 EST


On Fri, Oct 25, 2024 at 12:15:59PM +0530, Sibi Sankar wrote:
>
>
> On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> > >

Hi,

> > >
> > > On 10/23/24 21:56, Johan Hovold wrote:
> > > > On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > > > > On 10/10/24 20:32, Johan Hovold wrote:
> > > > > > On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > > > > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > > > > are required to X1E cpufreq device tree changes [2] to land.
> > > > > > >
> > > > > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@xxxxxxxxxxxxxxxxxxxx/
> > > > > > > [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@xxxxxxxxxxx/
> > > > > > >
> > > > > > > The following warnings remain unadressed:
> > > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > >
> > > > > > Are there any plans for how to address these?
> > > >
> > > > > Sorry missed replying to this. The error implies that duplicate
> > > > > opps are reported by the SCP firmware and appear once during probe.
> > > >
> > > > I only see it at boot, but it shows up four times here with the CRD:
> > >
> > > https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@xxxxxxxxxxx/
> > >
> > > As explained ^^, we see duplicates for max sustainable performance twice
> > > for each domain.
> >
> > If existing products were shipped with the firmware that lists single
> > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
>
> That was a qualcomm specific driver and hence we could do such
> kind of filtering. This however is the generic scmi perf protocol
> and it's not something we should ever consider introducing :/
>

+1

In the case of the other warnings, those were similarly related to
duplicates, but the warns themselves were genereated by the OPP
subsystem when trying to register a duplicate...it was indeed a bug
at the SCMI layer to try registering a well-known duplicate with
the OPP subsytem, so it was fixed in the SCMI stack...avoiding to
propagate it to the OPP layer...but the duplicate error condition
indeed still exist (since dependent on what the fw spits out) and they
are trapped at the SCMI level and those noisy warning are just meant
to trap this kind of firmware anomalies...

...IOW who would have ever spotted this thing and considered to fix the
firmware in future releases without the warnings :P ?

> >
> > >
> > > >
> > > > [ 8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [ 8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [ 8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [ 8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >
> > > > > This particular error can be fixed only by a firmware update and you
> > > > > should be able to test it out soon on the CRD first.
> > > >
> > > > Can you explain why this can only be fixed by a firmware update? Why
> > > > can't we suppress these warnings as well, like we did for the other
> > > > warnings related to the duplicate entries?
> > > >
> > > > IIUC the firmware is not really broken, but rather describes a feature
> > > > that Linux does not (yet) support, right?
> > >
> > > We keep saying it's a buggy firmware because the SCP firmware reports
> > > identical perf and power levels for the additional two opps and the
> > > kernel has no way of treating it otherwise and we shouldn't suppress
> > > them. Out of the two duplicate opps reported one is a artifact from how
> > > Qualcomm usually show a transition to boost frequencies. The second opp
> > > which you say is a feature should be treated as a boost opp i.e. one
> > > core can run at max at a lower power when other cores are at idle but
> > > we can start marking them as such once they start advertising their
> > > correct power requirements. So I maintain that this is the best we
> > > can do and need a firmware update for us to address anything more.
> >
> > Will existing shipping products get these firmware updates?
>
> They are sure to trickle out but I guess it's upto the oem
> to decide if they do want to pick these up like some of the
> other firmware updates being tested only on CRD. Not sure why
> warnings duplicates should block cpufreq from landing for x1e
> but if that's what the community wants I can drop reposting
> this series!

Not sure indeed which is the problem with such warnings since they are
just doing their job...in general we tend not to disrupt operation even
when the firmware is buggy (if possible) BUT we definitely try to be
noisy about that to have firmware fixed (...not that fw guys seems so
scared usually about warnings though :P)

Anyway, I'm totally with Sibi here unless there is an impact at the
functional level...Sudeep may think otherwise of course ...

Thanks
Cristian