Re: [PATCH net-next 00/12] Add support for PSE port priority

From: Kyle Swenson
Date: Wed Oct 09 2024 - 13:46:11 EST


Hello Kory,

On Wed, Oct 09, 2024 at 05:04:00PM +0200, Kory Maincent wrote:
> Hello Kyle,
>
> On Wed, 9 Oct 2024 13:54:51 +0000
> Kyle Swenson <kyle.swenson@xxxxxxxx> wrote:
>
> > Hello Kory,
> >
> > On Wed, Oct 02, 2024 at 06:27:56PM +0200, Kory Maincent wrote:
> > > From: Kory Maincent (Dent Project) <kory.maincent@xxxxxxxxxxx>
> > >
> > > This series brings support for port priority in the PSE subsystem.
> > > PSE controllers can set priorities to decide which ports should be
> > > turned off in case of special events like over-current.
> >
> > First off, great work here. I've read through the patches in the series and
> > have a pretty good idea of what you're trying to achieve- use the PSE
> > controller's idea of "port priority" and expose this to userspace via ethtool.
> >
> > I think this is probably sufficient but I wanted to share my experience
> > supporting a system level PSE power budget with PSE port priorities across
> > different PSE controllers through the same userspace interface such that
> > userspace doesn't know or care about the underlying PSE controller.
> >
> > Out of the three PSE controllers I'm aware of (Microchip's PD692x0, TI's
> > TPS2388x, and LTC's LT4266), the PD692x0 definitely has the most advanced
> > configuration, supporting concepts like a system (well, manager) level budget
> > and powering off lower priority ports in the event that the port power
> > consumption is greater than the system budget.
> >
> > When we experimented with this feature in our routers, we found it to be using
> > the dynamic power consumed by a particular port- literally, the summation of
> > port current * port voltage across all the ports. While this behavior
> > technically saves the system from resetting or worse, it causes a bit of a
> > problem with lower priority ports getting powered off depending on the
> > behavior (power consumption) of unrelated devices.
> >
> > As an example, let's say we've got 4 devices, all powered, and we're close to
> > the power budget. One of the devices starts consuming more power (perhaps
> > it's modem just powered on), but not more than it's class limit. Say this
> > device consumes enough power to exceed the configured power budget, causing
> > the lowest priority device to be powered off. This is the documented and
> > intended behavior of the PD692x0 chipset, but causes an unpleasant user
> > experience because it's not really clear why some device was powered down all
> > the sudden. Was it because someone unplugged it? Or because the modem on the
> > high priority device turned on? Or maybe that device had an overcurrent?
> > It'd be impossible to tell, and even worse, by the time someone is able to
> > physically look at the switch, the low priority device might be back online
> > (perhaps the modem on the high priority device powered off).
> >
> > This behavior is unique to the PD692x0- I'm much less familiar with the
> > TPS2388x's idea of port priority but it is very different from the PD692x0.
> > Frankly the behavior of the OSS pin is confusing and since we don't use the
> > PSE controllers' idea of port priority, it was safe to ignore it. Finally, the
> > LTC4266 has a "masked shutdown" ability where a predetermined set of ports are
> > shutdown when a specific pin (MSD) is driven low. Like the TPS2388x's OSS
> > pin, We ignore this feature on the LTC4266.
> >
> > If the end-goal here is to have a device-independent idea of "port priority" I
> > think we need to add a level of indirection between the port priority concept
> > and the actual PSE hardware. The indirection would enable a system with
> > multiple (possibly heterogeneous even) PSE chips to have a unified idea of
> > port priority. The way we've implemented this in our routers is by putting
> > the PSE controllers in "semi-auto" mode, where they continually detect and
> > classify PDs (powered device), but do not power them until instructed to do
> > so. The mechanism that decides to power a particular port or not (for lack
> > of a better term, "budgeting logic") uses the available system power budget
> > (configured from userspace), the relative port priorities (also configured
> > from userspace) and the class of a detected PD. The classification result is
> > used to determine the _maximum_ power a particular PD might draw, and that is
> > the value that is subtracted from the power budget.
> >
> > Using the PD's classification and then allocating it the maximum power for
> > that class enables a non-technical installer to plug in all the PDs at the
> > switch, and observe if all the PDs are powered (or not). But the important
> > part is (unless the port priorities or power budget are changed from
> > userspace) the devices that are powered won't change due to dynamic power
> > consumption of the other devices.
> >
> > I'm not sure what the right path is for the kernel, and I'm not sure how this
> > would look with the regulator integration, nor am I sure what the userspace
> > API should look like (we used sysfs, but that's probably not ideal for
> > upstream). It's also not clear how much of the budgeting logic should be in
> > the kernel, if any. Despite that, hopefully sharing our experience is
> > insightful and/or helpful. If not, feel free to ignore it. In any case,
> > you've got my
>
> Thanks for your review and for sharing your PSE experience.
> It indeed is insightful for further development and update of this series.

Excellent, glad to hear it.

> So you are saying that from a use experience the port priority feature is not
> user-friendly as we don't know why a port has been shutdown.
> Even if we can report the over-current event of which port caused it, you still
> thinks it is not useful?

Well, not quite. I think the concept of a "port priority" is useful,
but I don't know that the PD692xx's concept of "port priority" is what
we want. The issue is the PD692xx's budgeting algorithm is based on
dynamic power used (i.e. the total power used at any given time). Since
this is, well, dynamic, it makes it confusing when a lower priority port
is powered off due to the runtime behavior of higher-priority ports.
It's even more confusing if the implicit or default port priorities are
used.

Instead, we found that using the maximum power that is allowed be drawn
by a particular PD's class (set by the IEEE standard) is more user
friendly, because the set of devices that are powered won't change
(unless priorities are changed, or the system budget is changed).
For example, if we've got 4 devices plugged in, and the three highest
priority devices consume all the power budget, the lowest priority
device won't ever be powered. There isn't a case where the lowest
priority device will be shut down because a higher priority device
starts consuming more power at some point in the future.

> We could have several cases for over power budget event:
> - The power limit exceeded is the one configured for the ports.
> We should shutdown only that port without taking care about priority.
> TPS23881 has this behavior when power exceed Pcut.
> I think the PD692x0 does the same. Need to verify.

These conditions I'd not call "over power budget events". I'd call them
"port overcurrent events" and I agree, those only affect the specific
problem port.

> - The power limit exceeded is the global (or manager PD69208M) power budget.
> Here port priority is interesting.
> Is there a way to know which port create this global power limit excess?
> Should we turn off this port even if he don't exceed his own power limit or
> should we turn off low priority ports?

I think it's important to make a distinction between an "overcurrent"
condition and the condition where we've exceeded the system power
budget. An "overcurrent" is port-specific, and can happen if the PD
consumes more power than the classification of the device allows. For
example, if a Class 3 PD (i.e. 802.3at, also referred to as a Type II
PD) consumes more than 15.4 W at the PSE, it will be shutdown
immediately. This support is required by all the IEEE 802.3 standards
around PoE (.af, .at. and .bt) and is a safety thing. The TPS2388x
implements this with Pcut, the LTC4266 impliments this with Icut
register, and the PD692xx implements it with the port power limit
registers.

The condition where we've exceeded our system-level power
budget is a little different, in that it causes a port to be shutdown
despite that port not exceeding it's class power limit. This condition
is the case I'm concerned we're solving in this series, and solving it
for the PD692xx case only, and it's based off dynamic power consumption.

So I guess I'm suggesting that we take the power budgeting concept out
of the PSE drivers, and put it into software (either kernel, userspace)
instead of the PSE hardware.

> I can't find global power budget concept for the TPS23881.

This is because this idea doesn't exist on the TPS2388x.

> I could't test this case because I don't have enough load. In fact, maybe by
> setting the PD692x0 power bank limit low it could work.

Hopefully this helps clarify.

>
> Regards,
> --
> Köry Maincent, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com

Thanks,
Kyle