Re: [PATCH net-next 08/12] net: pse-pd: pd692x0: Add support for PSE PI priority feature

From: Andrew Lunn
Date: Wed Oct 09 2024 - 12:22:51 EST


On Wed, Oct 09, 2024 at 09:16:33AM +0200, Oleksij Rempel wrote:
> Hi Andrew,
>
> On Tue, Oct 08, 2024 at 06:50:25PM +0200, Andrew Lunn wrote:
> > On Tue, Oct 08, 2024 at 03:57:22PM +0200, Oleksij Rempel wrote:
> > > On Thu, Oct 03, 2024 at 01:41:02AM +0200, Andrew Lunn wrote:
> > > > > + msg = pd692x0_msg_template_list[PD692X0_MSG_SET_PORT_PARAM];
> > > > > + msg.sub[2] = id;
> > > > > + /* Controller priority from 1 to 3 */
> > > > > + msg.data[4] = prio + 1;
> > > >
> > > > Does 0 have a meaning? It just seems an odd design if it does not.
> > >
> > > 0 is not documented. But there are sub-priority which are not directly
> > > configured by user, but affect the system behavior.
> > >
> > > Priority#: Critical – 1; high – 2; low – 3
> > > For ports with the same priority, the PoE Controller sets the
> > > sub-priority according to the logic port number. (Lower number gets
> > > higher priority).
> >
> > With less priorities than ports, there is always going to be something
> > like this.
> >
> > >
> > > Port priority affects:
> > > 1. Power-up order: After a reset, the ports are powered up according to
> > > their priority, highest to lowest, highest priority will power up first.
> > > 2. Shutdown order: When exceeding the power budget, lowest priority
> > > ports will turn off first.
> > >
> > > Should we return sub priorities on the prio get request?
> >
> > I should be optional, since we might not actually know what a
> > particular device is doing. It could pick at random, it could pick a
> > port which is consuming just enough to cover the shortfall if it was
> > turned off, it could pick the highest consumer of the lowest priority
> > etc. Some of these conditions are not going to be easy to describe
> > even if we do know it.
>
> After reviewing the manuals for LTC4266 and TPS2388x, I realized that these
> controllers expose interfaces, but they don't implement prioritization concepts
> themselves.
>
> The LTC4266 and TPS2388x controllers provide only interfaces that allow the
> kernel to manage shutdown and prioritization policies. For TPS2388x, fast
> shutdown is implemented as a port bitmask with only two priorities, handled via
> the OSS pin. Fast shutdown is triggered by the kernel on request by toggling
> the corresponding pin, and the policy - when and why this pin is toggled - is
> defined by the kernel or user space. Slow shutdown, on the other hand, is
> managed via the I2C bus and allows for more refined control, enabling a wider
> range of priorities and more granular policies.
>
> I'll tend to hope we can reuse the proposed ETHTOOL_A_C33_PSE_PRIO interface
> across different PSE controllers. However, it is already being mapped to
> different shutdown concepts: PD692x0 firmware seems to rely on a slow shutdown
> backed by internal policies, while TPS2388x maps it to fast shutdown with
> driver specific policy. This inconsistency could force us to either break the
> UAPI or introduce a new, inconsistent one once we realize TPS2388x fast
> shutdown isn't what we actually need.

We should try to avoid fragmentation of the API, but given there is no
standardisation here, vendors are free to do whatever they want, this
may be difficult. Still, we should try to avoid it.

Maybe we want to consider a generic simple prioritization in the
kernel, plus export a generic model of PSE to user space to allow a
user space manager? This is really policy, and ideally we don't want
policy in the kernel. The in kernel implementation can use the
hardware prioritisation if it supports it, otherwise provide a library
of code which a driver can use to implement simple software
prioritisation?

For a user space manager, we already talked about an event to signal
that we are entering overload. I guess we additionally need an API to
get the current runtime state, what each consumer is actually
consuming, and a management API to shutdown or enable a port due to
overload, which is separate to the administrative state of a port.

The hardware/firmware might provide lots of options and capabilities,
but sometimes it is better it ignore them. Export a basic set which we
expect most PSE controllers can offer, and do the rest in our software
which we have full insight into, and debug and share, unlike firmware
which is a broken black box.

If we decide this is the way we want to go, we should not need to
extend the API for LTC4266 and TPS2388x, you just implement a basic
prioritisation in the driver. If it turns out to be not ideal, it does
not matter so much, so long as we have the understanding that
eventually we can have something better in userspace.

We can maybe do a rough sketch of what the kAPI looks like for a user
space manager, but we can kick the implementation down the road while
the in kernel prioritization is good enough.

Andrew