Re: [PATCH 2/2] interconnect: qcom: add MSM8x60 NoC driver

From: Dmitry Baryshkov

Date: Wed Jun 03 2026 - 19:00:28 EST


On Wed, Jun 03, 2026 at 06:34:10PM +0200, Herman van Hazendonk wrote:
> Add a Qualcomm interconnect driver for the MSM8x60 family modelling the
> four NoC fabrics (APPSS, System, MMSS, DFAB) that connect masters and
> slaves on these Scorpion-class SoCs. The driver implements the
> interconnect-provider API to manage bandwidth between specific masters
> and slaves via the RPM arbitration tables.
>
> Each fabric carries:
> - A bus clock (managed via clk_bulk APIs) whose rate is the
> aggregated bandwidth divided by the fabric bus width, with a
> minimum floor of 384 MHz to prevent USB starvation observed when
> the fabric drops to the previously-used 266 MHz minimum.
> - An RPM arbitration buffer (arb / bwsum) that the RPM firmware
> consumes via its shared-memory protocol; commits go via the
> qcom-rpm driver's set_resource API.
>
> msm8660_get_rpm() pins the supplier with device_link_add() before
> reading drvdata so an unbind/rebind window cannot leave a stale
> qcom_rpm pointer. clk_bulk_prepare_enable is paired with a
> devm_add_action_or_reset cleanup so an EPROBE_DEFER from the RPM
> lookup does not leak the prepare/enable refcount across retries. The
> fabric rate cap uses min_t(u64,...) so a bandwidth request exceeding
> 4 GiB/s cannot wrap through u32 truncation into a near-zero clock
> rate that would halt the interconnect.

Thanks for working on this.

>
> Signed-off-by: Herman van Hazendonk <github.com@xxxxxxxxxx>
> ---
> drivers/interconnect/qcom/Kconfig | 14 +
> drivers/interconnect/qcom/Makefile | 2 +
> drivers/interconnect/qcom/msm8660.c | 1147 +++++++++++++++++
> .../dt-bindings/interconnect/qcom,msm8660.h | 6 +-
> 4 files changed, 1166 insertions(+), 3 deletions(-)
> create mode 100644 drivers/interconnect/qcom/msm8660.c
>
> +
> +/*
> + * RPM fabric arbitration data format (from legacy vendor kernel msm_bus_fabric.c):
> + *
> + * Each u16 arb entry: bit 15 = tier (1=TIER1 high priority), bits 14-0 = BW
> + * Bandwidth is in 128KB units (bytes >> 17).
> + * Two u16 values are packed into each u32 RPM register word.
> + *
> + * Buffer layout: [bwsum pairs] [arb pairs]
> + * bwsum[slave_port] = total bandwidth to that slave
> + * arb[(tier-1)*nmasters + master_port] = per-master arbitration entry
> + */
> +#define ARB_BWMASK 0x7FFF
> +#define ARB_TIERMASK 0x8000
> +#define ARB_TIER1 1
> +#define ARB_TIER2 2
> +
> +static const struct clk_bulk_data msm8660_afab_clocks[] = {
> + { .id = "bus" },
> + { .id = "bus_a" },
> + { .id = "ebi1" },
> + { .id = "ebi1_a" },

What clocks are those? Please provide the actual dt-bindings schema
(together with the examples). In the past we had bus / bus_a clocks for
the SMD-RPM-based interconnects, but lyter they were ripped out and
replaced with the direct voting for the resources.

> +};
> +
> +
> +/*
> + * Node definitions with RPM port mapping.
> + *
> + * DEFINE_QNODE(_name, _id, _buswidth, _mas_port, _slv_port, _tier, links...)
> + *
> + * _mas_port: master port index for RPM ARB array (-1 if not a master)
> + * _slv_port: slave port index for RPM bwsum array (-1 if not a slave)
> + * _tier: master priority tier (ARB_TIER1=1, ARB_TIER2=2, 0 if N/A)
> + */
> +#define DEFINE_QNODE(_name, _id, _buswidth, _mas, _slv, _tier, ...) \

Please expand the macro.

> + static struct msm8660_icc_node _name = { \
> + .name = #_name, \
> + .id = _id, \
> + .buswidth = _buswidth, \
> + .num_links = COUNT_ARGS(__VA_ARGS__), \
> + .links = { __VA_ARGS__ }, \
> + .mas_port = _mas, \
> + .slv_port = _slv, \
> + .mas_tier = _tier, \
> + }
> +

[...]

> +/*
> + * Look up the RPM that owns fabric arbitration writes.
> + *
> + * Returns NULL if the DT does not have a "qcom,rpm" phandle (in which
> + * case the caller silently drops RPM ARB and runs the fabric purely
> + * via clk_set_rate).
> + *
> + * Returns ERR_PTR(-EPROBE_DEFER) if the RPM device exists in DT but
> + * its driver has not finished probing yet, or if device_link_add()
> + * fails. The caller is expected to propagate this so the interconnect
> + * driver gets retried once the RPM is ready.
> + *
> + * On success returns the qcom_rpm handle and pins the RPM device
> + * lifetime to ours via a consumer-supplier device link, so the
> + * devres-allocated qcom_rpm cannot be freed while we still hold a
> + * pointer to it.
> + */
> +static struct qcom_rpm *msm8660_get_rpm(struct device *dev)
> +{
> + struct device_node *rpm_np;
> + struct platform_device *rpm_pdev;
> + struct device_link *link;
> + struct qcom_rpm *rpm;
> +
> + rpm_np = of_parse_phandle(dev->of_node, "qcom,rpm", 0);
> + if (!rpm_np) {
> + dev_dbg(dev, "no qcom,rpm phandle, RPM ARB disabled\n");
> + return NULL;

Here and further, return dev_err_ptr_probe().

> + }
> +
> + rpm_pdev = of_find_device_by_node(rpm_np);
> + of_node_put(rpm_np);
> + if (!rpm_pdev) {
> + dev_dbg(dev, "RPM device not found yet, deferring probe\n");
> + return ERR_PTR(-EPROBE_DEFER);
> + }
> +
> + /*
> + * Pin the supplier BEFORE reading its drvdata. The device link
> + * (MANAGED, the default state) prevents the RPM driver from being
> + * unbound while we hold the link, which closes the window where a
> + * concurrent unbind+rebind could free the qcom_rpm pointer between
> + * dev_get_drvdata() and the link being established. If the link
> + * cannot be added (e.g. supplier is in the process of being
> + * removed) we defer and retry.
> + */
> + link = device_link_add(dev, &rpm_pdev->dev,
> + DL_FLAG_AUTOREMOVE_CONSUMER);
> + put_device(&rpm_pdev->dev);
> + if (!link) {
> + dev_warn(dev, "failed to add device link to RPM, deferring\n");
> + return ERR_PTR(-EPROBE_DEFER);
> + }
> +
> + /*
> + * Safe to read drvdata now: the device link pins the supplier so
> + * it cannot be unbound until our consumer (this interconnect
> + * provider) is unbound first.
> + */
> + rpm = dev_get_drvdata(&rpm_pdev->dev);
> + if (!rpm) {
> + dev_dbg(dev, "RPM not ready, deferring probe\n");
> + device_link_remove(dev, &rpm_pdev->dev);
> + return ERR_PTR(-EPROBE_DEFER);
> + }
> +
> + return rpm;
> +}
> +
> +static int msm8660_icc_probe(struct platform_device *pdev)
> +{
> + const struct msm8660_icc_desc *desc;
> + struct msm8660_icc_node * const *qnodes;
> + struct msm8660_icc_provider *qp;
> + struct device *dev = &pdev->dev;
> + struct icc_onecell_data *data;
> + struct icc_provider *provider;
> + struct icc_node *node;
> + size_t num_nodes, i;
> + int ret;
> +
> + desc = of_device_get_match_data(dev);
> + if (!desc)
> + return -EINVAL;
> +
> + qnodes = desc->nodes;
> + num_nodes = desc->num_nodes;
> +
> + qp = devm_kzalloc(dev, sizeof(*qp), GFP_KERNEL);
> + if (!qp)
> + return -ENOMEM;
> +
> + data = devm_kzalloc(dev, struct_size(data, nodes, num_nodes),
> + GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> + data->num_nodes = num_nodes;
> +
> + qp->bus_clks = devm_kmemdup(dev, desc->bus_clks,
> + desc->num_clks * sizeof(*desc->bus_clks),
> + GFP_KERNEL);
> + if (!qp->bus_clks)
> + return -ENOMEM;
> +
> + qp->num_clks = desc->num_clks;
> +
> + /*
> + * MSM8660 fabric clocks are managed by RPM firmware and may not be
> + * available in mainline Linux yet. Once the clock provider exists,
> + * we want to honour it; until then we run without per-fabric clock
> + * scaling. The crucial part is that -EPROBE_DEFER means "the
> + * provider exists but hasn't probed yet" and MUST be propagated so
> + * we get retried; only other errors (genuine -ENOENT, etc.) get
> + * downgraded to "no clocks, continue".

Please limit to -ENOENT only.

> + */
> + ret = devm_clk_bulk_get_optional(dev, qp->num_clks, qp->bus_clks);
> + if (ret == -EPROBE_DEFER)
> + return ret;
> + if (ret) {
> + dev_warn(dev, "Failed to get bus clocks: %d (continuing without clock scaling)\n",
> + ret);
> + qp->num_clks = 0;
> + }
> +
> + if (qp->num_clks) {
> + ret = clk_bulk_prepare_enable(qp->num_clks, qp->bus_clks);
> + if (ret) {
> + dev_warn(dev, "Failed to enable bus clocks: %d\n", ret);
> + qp->num_clks = 0;
> + } else {
> + /*
> + * Register the cleanup right after a successful
> + * prepare_enable so any later -EPROBE_DEFER or other
> + * probe error path (e.g. msm8660_get_rpm failing
> + * with -EPROBE_DEFER below) does not leak a clock
> + * prepare/enable reference across the retry.
> + */
> + ret = devm_add_action_or_reset(dev,
> + msm8660_icc_clk_release, qp);
> + if (ret)
> + return ret;
> + }
> + }
> +
> + /* Set up RPM fabric arbitration */
> + qp->desc = desc;
> + if (desc->rpm_resource >= 0) {
> + qp->rpm = msm8660_get_rpm(dev);
> + if (IS_ERR(qp->rpm))
> + return PTR_ERR(qp->rpm);
> + if (qp->rpm) {
> + int arb_size = desc->nmasters * desc->ntieredslaves;
> +
> + qp->bwsum = devm_kcalloc(dev, desc->nslaves,
> + sizeof(u16), GFP_KERNEL);
> + qp->arb = devm_kcalloc(dev, arb_size,
> + sizeof(u16), GFP_KERNEL);
> + qp->rpm_buf = devm_kcalloc(dev, desc->rpm_buf_size,
> + sizeof(u32), GFP_KERNEL);
> + if (!qp->bwsum || !qp->arb || !qp->rpm_buf) {
> + dev_warn(dev, "RPM buffer alloc failed, ARB disabled\n");
> + qp->rpm = NULL;
> + } else {
> + int rc;
> +
> + dev_info(dev, "RPM fabric ARB enabled (%d masters, %d slaves, %d tiered)\n",
> + desc->nmasters, desc->nslaves,
> + desc->ntieredslaves);
> +
> + /*
> + * One-shot sleep-context vote of zero bandwidth.
> + * Without an explicit SLEEP_STATE write, RPM has no
> + * fabric bandwidth target for deep-sleep and may
> + * keep the active vote applied indefinitely,
> + * preventing DDR from dropping its rate when CPUs
> + * power-collapse. The buffer is devm_kcalloc'd so
> + * it is all-zero at this point — written before
> + * any consumer can drive an active vote that would
> + * dirty it.
> + *
> + * msm8660_rpm_commit() writes ACTIVE_STATE only;
> + * SLEEP_STATE remains zero for the provider's
> + * lifetime, so this vote does not need refreshing.
> + */
> + rc = qcom_rpm_write(qp->rpm,
> + QCOM_RPM_SLEEP_STATE,
> + desc->rpm_resource,
> + qp->rpm_buf,
> + desc->rpm_buf_size);
> + if (rc)
> + dev_warn(dev, "RPM fabric sleep vote failed: %d\n",
> + rc);
> + }
> + }
> + }
> +
> + provider = &qp->provider;
> + provider->dev = dev;
> + provider->set = msm8660_icc_set;
> + provider->aggregate = msm8660_icc_aggregate;
> + provider->xlate = of_icc_xlate_onecell;
> + provider->data = data;
> + provider->get_bw = msm8660_get_bw;
> +
> + icc_provider_init(provider);
> +
> + for (i = 0; i < num_nodes; i++) {
> + size_t j;
> +
> + if (!qnodes[i])
> + continue;
> +
> + node = icc_node_create(qnodes[i]->id);

Please switch to the dynamic ICC node numbers (see icc-rpmh)

> + if (IS_ERR(node)) {
> + ret = PTR_ERR(node);
> + goto err_remove_nodes;
> + }
> +
> + node->name = qnodes[i]->name;
> + node->data = qnodes[i];
> + icc_node_add(node, provider);
> +

--
With best wishes
Dmitry