Re: [PATCH V4 0/5] arm_scmi: vendors: Qualcomm Generic Vendor Extensions

From: Sibi Sankar
Date: Mon Dec 23 2024 - 09:49:40 EST




On 12/17/24 20:15, Cristian Marussi wrote:
On Tue, Dec 17, 2024 at 05:55:35PM +0530, Sibi Sankar wrote:


On 12/5/24 22:31, Sudeep Holla wrote:
On Fri, Nov 22, 2024 at 09:37:47AM +0100, Johan Hovold wrote:
On Thu, Nov 14, 2024 at 09:52:12AM +0530, Sibi Sankar wrote:
On 11/8/24 20:44, Johan Hovold wrote:

On Wed, Nov 06, 2024 at 01:55:33PM +0100, Johan Hovold wrote:

Second, after loading the protocol and client drivers manually (in that
order, shouldn't the client driver pull in the protocol?), I got:

scmi_module: Loaded SCMI Vendor Protocol 0x80 - Qualcomm 20000
arm-scmi arm-scmi.0.auto: QCOM Generic Vendor Version 1.0
scmi-qcom-generic-ext-memlat scmi_dev.5: error -EOPNOTSUPP: failed to configure common events
scmi-qcom-generic-ext-memlat scmi_dev.5: probe with driver scmi-qcom-generic-ext-memlat failed with error -95

which seems to suggest that the firmware on my CRD does not support this
feature. Is that the way this should be interpreted? And does that mean
that non of the commercial laptops supports this either?

Yeah, hopefully Sibi can shed some light on this. I'm using the DT
patch (5/5) from this series, which according to the commit message is
supposed to enable bus scaling on the x1e80100 platform. So I guess
something is missing in my firmware.

Nah, it's probably just because of the algo string used.
The past few series used caps MEMLAT string instead of
memlat to pass the tuneables, looks like all the laptops
havn't really switched to it yet. Will revert back to
using to lower case memlat so that all devices are
supported. Thanks for trying the series out!

I have a Lenovo ThinkPad T14s set up now so I gave this series a spin
there too, and there I do *not* see the above mentioned -EOPNOSUPP error
and the memlat driver probes successfully.

On the other hand, this series seems to have no effect on a kernel
compilation benchmark. Is that expected?


Hijacking this thread to rant about state of firmware implementation on
this platform that gives me zero confidence in merging any of these without
examining each of the interface details in depth and at lengths.



Hi Sibi,

Hey Sudeep,

Thanks for taking time to review the series.

Also I see the standard protocol like PERF seem to have so many issues which
adds to my no confidence. I can't comment on that thread for specific reasons.

^^ is largely untrue, a lot of finger pointing and a gross
misrepresentation of reality :/

The only major problem that X1E perf protocol has is a firmware
crash in the LEVEL_GET regular message implementation. This
pretty much went unnoticed because of messaging in perf implementation
in kernel. Given the fastchannel implementation isn't mandatory
according to spec, the kernel clearly says it switches to
regular messaging when it clearly doesn't do that and uses
stale data structures instead. This ensured that level get regular
messaging never got tested.

You claimed this a couple of times here and on IRC, but sincerely,
looking at the fastchannel implementation in SCMI core and Perf, I could
not track down where this could have happened in the recent code
(i.e. with or without your recent, welcomed, patches...)

When FC initialization fails and bailout it says:

"Failed to get FC for protocol %X [MSG_ID:%u / RES_ID:%u] - ret:%d. Using regular messaging."

... and it clears any gathered address for that FC, so that in __scmi_perf_level_get()
you end up skipping the FC machinery and use messaging

if (dom->fc_info && dom->fc_info[PERF_FC_LEVEL].get_addr) {
...
}

return scmi_perf_msg_level_get(ph, dom->id, level, poll);

Now this is done ONLY for the FC that specifically failed
initialization, i.e. identified by the tuple PROTO_ID/MSG_ID/RES_ID
(as stated in the noisy message above where MSG_ID is specified) NOT for
all Fastchannel, so you can have an FC successfully initialized only on
the GET but failing in the SET, so only the GET FC will be used.

I dont really understand how the Kernel was misbehaving and using
instead stale data, neither, if this was the case, I can see where this
issue would have been fixed.

To be clear, I am not really interested in throwing an argument here, but
I sincerely dont see where the alleged problem was and how was fixed (kernel
side), so I fear it could be still there, hidden maybe by a change in the
platform fw.

Apologies if I missed something along the history of this..

lol, this is pretty embarrassing :|, It's just like you said
looks like this fw supports get_level fastchannel but fails
to say it supports it. This was the reason behind get_level
regular message for being never tested and being buggy and
had nothing to do the kernel messaging or being buggy.
My bad :(, sry again.

-Sibi


Thanks,
Cristian