Re: [PATCH net-next 00/12] Add support for PSE port priority

From: Kory Maincent
Date: Tue Oct 15 2024 - 05:44:12 EST


Hello,

On Thu, 10 Oct 2024 07:42:25 +0200
Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx> wrote:

> > The condition where we've exceeded our system-level power
> > budget is a little different, in that it causes a port to be shutdown
> > despite that port not exceeding it's class power limit. This condition
> > is the case I'm concerned we're solving in this series, and solving it
> > for the PD692xx case only, and it's based off dynamic power consumption.
> >
> > So I guess I'm suggesting that we take the power budgeting concept out
> > of the PSE drivers, and put it into software (either kernel, userspace)
> > instead of the PSE hardware.
> >
> > > I can't find global power budget concept for the TPS23881.
> >
> > This is because this idea doesn't exist on the TPS2388x.
> >
> > > I could't test this case because I don't have enough load. In fact,
> > > maybe by setting the PD692x0 power bank limit low it could work.
> >
> > Hopefully this helps clarify.
>
>
> Thank you for your detailed insights. Before we dive deeper into policies and
> implementations, I’d like to clarify an important point to avoid confusion
> later. When comparing different PSE components, it's crucial to note that the
> Microchip PD692x0 operates in two distinct categories:
> 1. PoE controller (PD692x0)
> 2. PoE manager (PD6920x)
>
> Comparing the PoE controller (PD692x0) with TPS2388x or LTC4266 isn't entirely
> fair, as TPS2388x and LTC4266 are more comparable to the PoE manager
> (PD6920x). The functionalities provided by the PoE controller (PD692x0) are
> things we would need to implement ourselves on the software stack (kernel or
> userspace). The budget heuristic that is implemented in the PD692x0's
> firmware is absent in TPS2388x and LTC4266.
>
> Policy Variants and Implementation
>
> In cases where we are discussing prioritization, we are fundamentally talking
> about over-provisioning. This typically means that while a device advertises a
> certain maximum per-port power capacity (e.g., 95W), the total system power
> budget (e.g., 300W) is insufficient to supply maximum power to all ports
> simultaneously. This is often due to various system limitations, and if there
> were no power limits, prioritization wouldn't be necessary.
>
> The challenge then becomes how to squeeze more Powered Devices (PDs) onto one
> PSE system. Here are two methods for over-provisioning:
>
> 1. Static Method:
>
> This method involves distributing power based on PD classification. It’s
> straightforward and stable, with the software (probably within the PSE
> framework) keeping track of the budget and subtracting the power requested
> by each PD’s class.
>
> Advantages: Every PD gets its promised power at any time, which guarantees
> reliability.
>
> Disadvantages: PD classification steps are large, meaning devices request
> much more power than they actually need. As a result, the power supply may
> only operate at, say, 50% capacity, which is inefficient and wastes money.
>
> 2. Dynamic Method:
>
> To address the inefficiencies of the static method, vendors like Microchip
> have introduced dynamic power budgeting, as seen in the PD692x0 firmware.
> This method monitors the current consumption per port and subtracts it from
> the available power budget. When the budget is exceeded, lower-priority
> ports are shut down.
>
> Advantages: This method optimizes resource utilization, saving costs.
>
> Disadvantages: Low-priority devices may experience instability. A possible
> improvement could involve using LLDP protocols to dynamically configure
> power limits per port, thus allowing us to reduce power on over-consuming
> ports rather than shutting them down entirely.

Indeed we will have only static method for PSE controllers not supporting system
power budget management like the TPS2388x or LTC426.
Both method could be supported for "smart" PSE controller like PD692x0.

Let's begin with the static method implementation in the PSE framework for now.
It will need the power domain notion you have talked about.

> Recommendations for Software Handling
>
> Both methods have their pros and cons. Since the dynamic method is not always
> desirable, and if there's no way to disable it in the PD692x0's firmware, one
> potential workaround could be handling the budget in software and dynamically
> setting per-port limits. For instance, with a total budget of 300W and unused
> ports, we could initially set 95W limits per port. As high-priority PDs (e.g.,
> three 95W devices) are powered, we could dynamically reduce the power limit on
> the remaining ports to 15W, ensuring that no device exceeds that
> classification threshold.
>
> This is just one idea, and there are likely other policy variants we could
> explore. Importantly, I believe these heuristics don’t belong in the kernel
> itself. Instead, the kernel should simply provide the necessary interfaces,
> leaving the policy implementation to userspace management software. At least
> this is a lesson learned from Thermal Management talk at LPC :D

I think the kernel is only missing the PSE notification events to be ready to
leave the port priority policy to the userspace.

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com