Re: [PATCH net-next v6 06/12] net: pse-pd: Add support for budget evaluation strategies
From: Oleksij Rempel
Date: Mon Mar 17 2025 - 08:46:38 EST
On Tue, Mar 04, 2025 at 11:18:55AM +0100, Kory Maincent wrote:
> From: Kory Maincent (Dent Project) <kory.maincent@xxxxxxxxxxx>
> +/**
> + * pse_disable_pi_prio - Disable all PIs of a given priority inside a PSE
> + * power domain
> + * @pcdev: a pointer to the PSE
> + * @pw_d: a pointer to the PSE power domain
> + * @prio: priority
> + *
> + * Return: 0 on success and failure value on error
> + */
> +static int pse_disable_pi_prio(struct pse_controller_dev *pcdev,
> + struct pse_power_domain *pw_d,
> + int prio)
> +{
> + int i;
> +
Should we lock the pi[] array at some level?
> + for (i = 0; i < pcdev->nr_lines; i++) {
> + int ret;
> +
> + if (pcdev->pi[i].prio != prio ||
> + pcdev->pi[i].pw_d != pw_d ||
> + !pcdev->pi[i].admin_state_enabled)
> + continue;
> +
> + ret = pse_disable_pi_pol(pcdev, i);
If the PSE has many lower-priority ports, the same set of ports could be
repeatedly shut down while higher-priority ports keep power
indefinitely.
This could result in a starvation issue, where lower-priority group of
ports may never get a chance to stay enabled, even if power briefly
becomes available.
To fix this, we could:
- Disallow identical priorities in static mode to ensure a clear
shutdown order.
- Modify pse_disable_pi_prio() to track freed power and stop
disabling once enough power is recovered.
static int pse_disable_pi_prio(struct pse_controller_dev *pcdev,
struct pse_power_domain *pw_d,
int prio, int required_power)
{
int i, ret;
int freed_power = 0;
mutex_lock(&pcdev->lock);
for (i = 0; i < pcdev->nr_lines; i++) {
if (pcdev->pi[i].prio != prio ||
pcdev->pi[i].pw_d != pw_d ||
!pcdev->pi[i].admin_state_enabled)
continue;
ret = pse_disable_pi_pol(pcdev, i);
if (ret == 0)
freed_power += pcdev->pi[i].pw_allocated_mW;
/* Stop once we have freed enough power */
if (freed_power >= required_power)
break;
}
mutex_unlock(&pcdev->lock);
return ret;
}
The current approach still introduces an implicit priority based on loop
execution order, but since it's predictable, it can serve as a reasonable
default.
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
....
> +/**
> + * pse_ethtool_set_prio - Set PSE PI priority according to the budget
> + * evaluation strategy
> + * @psec: PSE control pointer
> + * @extack: extack for reporting useful error messages
> + * @prio: priovity value
> + *
> + * Return: 0 on success and failure value on error
> + */
> +int pse_ethtool_set_prio(struct pse_control *psec,
> + struct netlink_ext_ack *extack,
> + unsigned int prio)
> +{
> + struct pse_controller_dev *pcdev = psec->pcdev;
> + const struct pse_controller_ops *ops;
> + int ret = 0;
> +
> + if (!pcdev->pi[psec->id].pw_d) {
> + NL_SET_ERR_MSG(extack, "no power domain attached");
> + return -EOPNOTSUPP;
> + }
> +
> + /* We don't want priority change in the middle of an
> + * enable/disable call or a priority mode change
> + */
> + mutex_lock(&pcdev->lock);
> + switch (pcdev->pi[psec->id].pw_d->budget_eval_strategy) {
> + case ETHTOOL_PSE_BUDGET_EVAL_STRAT_STATIC:
> + if (prio > pcdev->nr_lines) {
> + NL_SET_ERR_MSG_FMT(extack,
> + "priority %d exceed priority max %d",
> + prio, pcdev->nr_lines);
> + ret = -ERANGE;
> + goto out;
> + }
> +
> + pcdev->pi[psec->id].prio = prio;
In case we already out of the budget, we will need to re-evaluate the
prios. New configuration may affect state of ports.
Potentially we may need a bulk interface to assign prios, to speed-up
reconfiguration. But it is not needed right now.
> + break;
> +
> + case ETHTOOL_PSE_BUDGET_EVAL_STRAT_DYNAMIC:
> + ops = psec->pcdev->ops;
> + if (!ops->pi_set_prio) {
> + NL_SET_ERR_MSG(extack,
> + "pse driver does not support setting port priority");
> + ret = -EOPNOTSUPP;
> + goto out;
> + }
> +
> + if (prio > pcdev->pis_prio_max) {
> + NL_SET_ERR_MSG_FMT(extack,
> + "priority %d exceed priority max %d",
> + prio, pcdev->pis_prio_max);
> + ret = -ERANGE;
> + goto out;
> + }
> +
> + ret = ops->pi_set_prio(pcdev, psec->id, prio);
Here too, but in case of microchip PSE it will happen in the firmware.
May be add here a comment that currently it is done in firmware and to
be extended for the kernel based implementation.
> + break;
> +
> + default:
> + ret = -EOPNOTSUPP;
> + }
> +
> +out:
> + mutex_unlock(&pcdev->lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(pse_ethtool_set_prio);
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |