Re: [PATCH RFC RFT] interconnect: qcom: implement get_bw with rpmh_read

From: Neil Armstrong

Date: Wed Jan 14 2026 - 07:01:37 EST


On 1/14/26 11:31, Konrad Dybcio wrote:
On 1/14/26 11:07 AM, Neil Armstrong wrote:
On 1/14/26 11:01, Konrad Dybcio wrote:
On 1/13/26 6:53 PM, Georgi Djakov wrote:
On 11/6/25 6:46 PM, Neil Armstrong wrote:
Since we can actually read back the APPS rpmh interconnect
BCM votes we can actually implement the get_bw() callback
and provide a coherent average and peak bandwidth at probe time.

The benefits of that are:
- keep disabled BCMs disabled
- avoid voting unused BCMs to INT_MAX

If the interconnects are correctly described for a platform,
all the required BCMs would be voted to the maximum bandwidth
until sync_state is reached.

Since we only get the BCM vote, we need to redistribute
the vote values to the associated nodes. The initial BCM
votes are read back at probe time in order to be ready when
the get_bw() is called when a node is added.


FWIW, I was able to finally test this on sdm845. Some nodes are indeed
showing reasonable bandwidth values instead of the default INT_MAX.

As I learnt here

https://lore.kernel.org/linux-arm-msm/1e7594dc-dca6-42e7-b478-b063e3325aff@xxxxxxxxxxxxxxxx/

rpmh_read() will only retrieve the currently active values, so as-is,
this hunk:

+    /* For boot-up, fill the AMC vote in all buckets */
+    for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
+        bcm->vote_x[i] = x;
+        bcm->vote_y[i] = y;
+    }

is lying about the state of wake/sleep buckets

this is ""fine"" today, as I don't see any "if (old_bw == new_bw)" checks
across the framework, but debugfs is going to report incorrect values and
if anyone decides to add the aforementioned check, it may introduce issues
where the values aren't commited to the hardware (because Linux is going
to believe they're already set)

This is only for the pre-sync-state phase, where we don't need the wake/sleep
values but the interconnect rpmh implementation needs them, and anyway they will
be replaced by proper values in sync_state

I realize this may not be the most convincing argument, but consider
the case where sync_state can not be hit, for example with the Venus
driver that requests FW at probe time and errors out if it's absent

We're talking about initial states here, if a device votes for an interconnect
path, even before sync_state, the path will be voted with the requested bandwidth.

https://elixir.bootlin.com/linux/v6.18.5/source/drivers/interconnect/core.c#L295

Before this patch:
node->init_avg & node->init_peak are set to INT_MAX, so max(x, INT_MAX) always gives INT_MAX
After this patch:
node->init_avg & node->init_peak are from the boot, which can be 0. So we either vote
for the requested bandwidth, or floor the bandwidth set by the bootloader (could be a higher value).


So this is an informed & assumed choice I did here. It's a small optimization
to avoid turning on _all_ interconnects at INT_MAX, and keep boot votes
up to sync_state.

Another question is, whether that's a desired change - I could easily
see pinning buses to the maximum speed helping boot time KPIs, but
perhaps that could/should be configurable?

It's all about the rest of the bussed endpoints, enabling _all_ endpoints
to INT_MAX could potentially lead to in fact reducing bandwidth for crucial
devices like UFS because we configure everything to the max bandwidth, and enable
unused busses (and associated clocks & power domains) for nothing.

The idea in this patch is to keep the votes from bootloader, keep the disabled
endpoints and vote with requested bandwidth for devices which are used in the boot process.

Neil


Konrad