Re: [PATCH net-next v4 1/2] net: ti: icssg-prueth: Add Frame Preemption MAC Merge support

From: Vladimir Oltean

Date: Thu Feb 26 2026 - 13:00:00 EST


Hi Meghana,

On Tue, Feb 24, 2026 at 06:18:02PM +0530, Meghana Malladi wrote:
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_qos.c b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> new file mode 100644
> index 000000000000..388dfcea426b
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> @@ -0,0 +1,223 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Texas Instruments ICSSG PRUETH QoS submodule
> + * Copyright (C) 2023 Texas Instruments Incorporated - http://www.ti.com/
> + */
> +
> +#include "icssg_prueth.h"
> +#include "icssg_switch_map.h"
> +
> +static void icssg_iet_set_preempt_mask(struct prueth_emac *emac, u8 preemptible_tcs)
> +{
> + void __iomem *config = emac->dram.va + ICSSG_CONFIG_OFFSET;
> + struct prueth_qos_mqprio *p_mqprio = &emac->qos.mqprio;
> + struct tc_mqprio_qopt *qopt = &p_mqprio->mqprio.qopt;
> + int prempt_mask = 0, i;
> + u8 tc;
> +
> + /* Configure the queues based on the preemptible tc map set by the user */
> + for (tc = 0; tc < p_mqprio->mqprio.qopt.num_tc; tc++) {
> + /* check if the tc is preemptive or not */
> + if (preemptible_tcs & BIT(tc)) {
> + for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> + /* Set all the queues in this tc as preemptive queues */
> + writeb(BIT(4), config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> + prempt_mask &= ~BIT(i);
> + }
> + } else {
> + /* Set all the queues in this tc as express queues */
> + for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> + writeb(0, config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> + prempt_mask |= BIT(i);
> + }
> + }
> + netdev_set_tc_queue(emac->ndev, tc, qopt->count[tc], qopt->offset[tc]);
> + }
> + writeb(prempt_mask, config + EXPRESS_PRE_EMPTIVE_Q_MASK);
> +}

Shouldn't the preemptible TCs be committed to hardware only if FPE is
active? The callers pay absolutely no regard to that.

> +
> +static void icssg_config_ietfpe(struct work_struct *work)
> +{
> + struct prueth_qos_iet *iet =
> + container_of(work, struct prueth_qos_iet, fpe_config_task);
> + void __iomem *config = iet->emac->dram.va + ICSSG_CONFIG_OFFSET;
> + struct prueth_qos_mqprio *p_mqprio = &iet->emac->qos.mqprio;
> + bool enable = !!atomic_read(&iet->enable_fpe_config);
> + int ret;
> + u8 val;
> +
> + if (!netif_running(iet->emac->ndev))
> + return;
> +
> + mutex_lock(&iet->fpe_lock);
> +
> + /* Update FPE Tx enable bit (PRE_EMPTION_ENABLE_TX) if
> + * fpe_enabled is set to enable MM in Tx direction
> + */
> + writeb(enable ? 1 : 0, config + PRE_EMPTION_ENABLE_TX);
> +
> + /* If FPE is to be enabled, first configure MAC Verify state
> + * machine in firmware as firmware kicks the Verify process
> + * as soon as ICSSG_EMAC_PORT_PREMPT_TX_ENABLE command is
> + * received.
> + */
> + if (enable && iet->mac_verify_configure) {
> + writeb(1, config + PRE_EMPTION_ENABLE_VERIFY);
> + writew(iet->tx_min_frag_size, config + PRE_EMPTION_ADD_FRAG_SIZE_LOCAL);
> + writel(iet->verify_time_ms, config + PRE_EMPTION_VERIFY_TIME);
> + }
> +
> + /* Send command to enable FPE Tx side. Rx is always enabled */
> + ret = icssg_set_port_state(iet->emac,
> + enable ? ICSSG_EMAC_PORT_PREMPT_TX_ENABLE :
> + ICSSG_EMAC_PORT_PREMPT_TX_DISABLE);
> + if (ret) {
> + netdev_err(iet->emac->ndev, "TX preempt %s command failed\n",
> + str_enable_disable(enable));
> + writeb(0, config + PRE_EMPTION_ENABLE_VERIFY);
> + iet->verify_status = ICSSG_IETFPE_STATE_DISABLED;
> + goto unlock;
> + }
> +
> + if (enable && iet->mac_verify_configure) {
> + ret = readb_poll_timeout(config + PRE_EMPTION_VERIFY_STATUS, iet->verify_status,
> + (iet->verify_status == ICSSG_IETFPE_STATE_SUCCEEDED),
> + USEC_PER_MSEC, 5 * USEC_PER_SEC);

You are sleeping up to 5 seconds in the system_percpu_wq kernel-wide
workqueue, blocking the kernel from making any sort of progress with
other items in this workqueue. As include/linux/workqueue.h puts it:
"Don't queue works which can run for too long.".

I guess you should allocate a driver-private workqueue on which you can
sleep as much as you want. Or make the icssg_config_ietfpe task smarter,
to be stateful and reschedule itself until the PRE_EMPTION_VERIFY_STATUS
changes, or a timeout expires. But that's more complicated.

Side note: I had this question on my mind - all contexts from which you
call schedule_work(&iet->fpe_config_task) are sleepable, so why not just
invoke icssg_config_ietfpe() via a direct function call instead?
I guess that's why - it takes too long to reasonably wait for it from
call sites like ethtool, phylink etc. I would make sure this design
decision is part of the commit message.

But let's not lie to ourselves. Having a deferred fpe_config_task
creates its own problems which you are not handling well.

Consider:
- iet->tx_min_frag_size
- iet->verify_time_ms

Writer is emac_set_mm(), reader is icssg_config_ietfpe(). But the reader
can run concurrently with the writer. This means the reader can pick up
and send to firmware settings in an inconsistent state (old tx_min_frag_size
with new verify_time_ms). Configuration which was never requested by the user.

You have a mutex &iet->fpe_lock. Does it help? No.
You have an atomic &iet->enable_fpe_config. Does it help? Also no.

Please try to think of a synchronization pattern where the config
writer, emac_set_mm(), stops or otherwise blocks out the deferred reader
while it's making changes.

In addition, schedule_work() while the work is already scheduled will do
nothing. So if the fpe_config_task takes close to 5 seconds and the user
sends multiple ethtool --set-mm requests in that time, they will be
ignored or incorrectly processed.

Also, iet->fpe_active is problematic too. It has emac_get_mm(),
icssg_qos_link_up() and icssg_qos_link_down() as readers, and
icssg_config_ietfpe() as writer. But it's not obvious what the correct
access pattern is. These things rarely work correctly by chance :(

I'm sorry that I don't have more time to untangle everything and see
what would work best. As a result, the comments above are just "some"
observations. Please try to be more deliberate with the synchronization
procedures, explain them and I am more than happy to double-check their
sanity. It's just that not much effort seems to have been put into the
current proposal.