Re: [PATCH v2 07/12] NTB: Introduce functions to calculate multi-port resource index
From: Serge Semin
Date: Wed Mar 06 2019 - 17:45:52 EST
On Wed, Mar 06, 2019 at 12:11:11PM -0700, Logan Gunthorpe wrote:
>
>
> On 2019-03-05 6:24 p.m., Serge Semin wrote:
> >> + * In a 5 peer system, this function will return the following matrix
> >> + *
> >> + * pidx \ port 0 1 2 3 4
> >> + * 0 0 0 1 2 3
> >> + * 1 0 1 2 3 4
> >> + * 2 0 1 2 3 4
> >> + * 3 0 1 2 3 4
> >> + *
>
> Oh, first, oops: looks like I copied this down wrong anyway; the code
> was what I had intended, but the documented example should have been:
>
> pidx \ local_port 0 1 2 3 4
> 0 0 0 1 2 3
> 1 0 1 1 2 3
> 2 0 1 2 2 3
> 3 0 1 2 3 3
>
> And this is definitely the correct table we are aiming for.
> ntb_peer_resource_idx() is supposed to return the result of
> ntb_peer_port_idx(ntb, local_port) when run on the peer specified by pidx.
>
> Note: this table also makes sense because it only uses 4 resources for 5
> ports which is the best case scenario. (In other words, to communicate
> between N ports, N-1 resources are required on each peer).
>
Yes, it does use as much and as tight resources as it possible, but
only for the case of pure integer ports numbering. While in case if
there are gaps in the port numbers space (which is the only case we have
in supported hardware at this moment) it will lead to a failure if there
are ports with higher numbers, than there are MWs available (MWs availability
depends on the IDT chip firmware). Additionally it creates gaps in the MWs
space if physical ports are numbered with gaps. Since the only multi-port
device we've got now is IDT and it always has it' ports numbered with gaps as I
described, then the current implementation will definitely produced the
problems.
> > This table is too simplified to represent a generic case of port-index
> > mapping table. In particular the IDT PCIe switch got it ports numbered
> > with uneven integers like: 0 2 4 6 8 12 16 20 or 0 8 16, and so on.
> > Moreover some of the ports might be disabled or may have NTB functions
> > deactivated, in which case these ports shouldn't be considered by NTB subsystem
> > at all. Basically we may have any increasing subset of that port
> > numbers depending on the current IDT PCIe-switch ports setup.
>
> Yes, I did not consider situations where there would be gaps in the
> "port number" space. It wasn't at all clear from the code that this was
> possible. Switchtec hardware could be configured for such an
> arrangement, but I don't know why anyone would do that as it just
> needlessly complicates everything.
>
> As you point out, with a gap, we end up with something that is wrong:
>
> pidx \ port 0 1 3 4 5
> 0 0 0 2 3 4
> 1 0 1 2 3 4
> 2 0 1 3 3 4
> 3 0 1 3 4 4
>
> Here, the relationship between ntb_peer_resource_idx() and
> ntb_peer_port_idx() is not maintained and it seems to prescribe 5
> resources for 5 ports. If there were more gaps it would be even more wrong.
>
Exactly. The table will look even worse for the port numbers: 0 2 4 6 8 12 16 20.
> >> +static inline int ntb_peer_resource_idx(struct ntb_dev *ntb, int pidx)
> >> +{
> >> + int local_port, peer_port;
> >> +
> >> + if (pidx >= ntb_peer_port_count(ntb))
> >> + return -EINVAL;
> >> +
> >> + local_port = ntb_port_number(ntb);
> >> + peer_port = ntb_peer_port_number(ntb, pidx);
> >> +
> >> + if (peer_port < local_port)
> >> + return local_port - 1;
> >> + else
> >> + return local_port;
> >> +}
> >> +
> >
> > Instead of redefining the port-index table we can just fix the
> > ntb_peer_resource_idx() method, so it would return a global port index
> > instead of some number based on the port number. It can be done just by
> > the next modification:
> >
> > + if (peer_port <= local_port)
> > + return pidx;
> > + else
> > + return pidx + 1;
> >
>
> This creates a table that looks like:
>
> pidx \ port 0 1 2 3 4
> 0 1 0 0 0 0
> 1 2 2 1 1 1
> 2 3 3 3 2 2
> 3 4 4 4 4 3
>
> Which is not correct. In fact, it seems to require 5 resources for 5
> ports. This appears to be what is done in the current ntb_perf and I
> think I figured it out several months ago but it's way too messy and
> hard to understand and I don't want to spend the time to figure it out
> again.
>
Yes, this is how it used to be done in ntb_pingpong and is still done in the
ntb_perf driver. And it is correctly working. As I already described and
you wrote further, this table provides a Logical Ports numbering space:
peer port \ local port 0 2 4 6 8 12 16 20
0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1
4 2 2 2 2 2 2 2 2
6 3 3 3 3 3 3 3 3
8 4 4 4 4 4 4 4 4
12 5 5 5 5 5 5 5 5
16 6 6 6 6 6 6 6 6
20 7 7 7 7 7 7 7 7
(although I'd call it Global Port Indexes space)
Currently local port indexes don't enumerate the local port number. So
if you convert that table into the one you provided in the function
comment, then it'll look like this (similar to what you called incorrect):
pidx \ port 0 2 4 6 8 12 16 20
0 1 0 0 0 0 0 0 0
1 2 2 1 1 1 1 1 1
2 3 3 3 2 2 2 2 2
3 4 4 4 4 3 3 3 3
4 5 5 5 5 5 4 4 4
5 6 6 6 6 6 6 5 5
6 7 7 7 7 7 7 7 6
Yes, by using this table we'll waste one resource as always existing
gap (is it the only incorrect thing you had in mind?). But it is smaller
problem than to use physical port numbers, which produces much bigger gaps
in case of your table implementation as well.
Note, in addition in this case you'd need to reconsider your algorithm of the
resources initialization. Lets for example take alook at Port 0. You'd need
to have its outbound memory windows [1-7] pointing to the peers with ports
[2,4,...,20] (correspond to pidx [0-6] of Port 0). So in this case Port 2 would
have a port 0 inbound MW #1 retrieving data from Port 0 outbound MW #1, Port 4
would have a port 0 inbound MW #2 retrieving data from Port 0 outbound MW #2,
and so on.
So your current approach is inbound MW-centralized, while mine is developed
around the outbound MWs.
> IMO, in order to support gaps, we'd need to, on some layer, create an
> un-gapped numbering scheme for the ports. I think the easiest thing is
> to just have Logical and Physical port numbers; so we would have
> something like:
>
> Physical Port Number: 0 2 4 6 8 12 16 20
> Logical Port Number: 0 1 2 3 4 5 6 7
> Peer Index (Port 0): x 0 1 2 3 4 5 6
> Port Index (Port 8): 0 1 2 3 x 4 5 6
> (etc)
That's what I suggested in the two possible solutions:
1st solution: replace current pidx with Logical Port Number,
2nd solution: alter ntb_peer_resource_idx() so it would return the Logical Port Number.
IMO In case of the 2nd solution I'd also suggest to rename the
ntb_peer_resource_idx() method into ntb_peer_port_global_idx(),
and then consider the current port indexes used in the NTB API
as local port indexes. The resource indexing can be abstracted
by a macro like this:
#define ntb_peer_resource_idx ntb_peer_port_global_idx
Finally in order to close the space up we'd also need to define
a method: ntb_port_global_idx(), which would return a Logical (global)
index of local port.
>
> Where the Physical Port Number is whatever the hardware uses and the
> logical port number is a numbering scheme starting with zero with no
> gaps. Then the port indexes are still as we currently have them. If we
> say that the port numbers we have now are the Logical Port Number, then
> ntb_peer_resource_idx() is correct.
>
Current port numbers are the physical port numbers with gaps. That's why we
introduced the port-index NTB API abstraction in the first place, to have these gaps
eliminated and to provide a simple way of bulk setup. Although that abstraction
turned out not that suitable to distribute the shared resources. So
the Logical (Global) indexing is needed to do it (that's what ntb_pingpong used
to do and ntb_perf still does now).
> I would strongly argue that the clients don't need to know anything
> about the Physical Port Number and these should be handled strictly
> inside the drivers. If multiple drivers need to do something similar to
> map the logical to physical port numbers then we should introduce helper
> functions to allow them to do so. If the Physical Numbers are not
> contained in the driver than the API would need to be expanded to expose
> which numbers are actually used to avoid needing to constantly loop
> through all the indexes to find this out.
>
Absolutely agree with you. The main idea of NTB API was to provide a set
of methods to access the NTB hardware without any abstractions but
with possible useful helpers, like your NTB MSI library, or transport library,
or anything else. So the physical port numbers must be available for
the client drivers.
> On a similar vein, I'd suggest that most clients shouldn't even really
> need to do anything with the Logical Port Number and should deal largely
> with Port Indexes. Ideally, Logical Port Numbers should only be used by
> helper code in the common layer to help assign resources used by the
> clients (like ntb_peer_resource_idx()).
>
This is the main question. Do we really need the current port indexes
implementation at all? After all these years of NTB API usage I don't really
see it useful in any case except to loop over the outbound MW resources
automatically skipping the local port (usefulness of this is also questionable).
As I already said I created the port-index table this way due to the IDT NTB
MWs peculiarity, which doesn't seem to me a big problem now comparing to all
these additional complications we intend to introduce.
The rest of the drivers code really need to have the Logical (global)
port indexes, at least to distribute the shared resources, and don't use
the current pidx that much.
Wouldn't it be better to just redefine the current port-index table
in the following way?
ntb_port_number() - local physical port number,
ntb_port_idx() - local port logical (global) index,
ntb_peer_port_count() - total number of ports NTB device provide (including the local one),
ntb_peer_port_number() - physical port number of the peer with passed logical port index,
ntb_peer_port_idx - logical port index of the passed physical port number.
while currently we have:
ntb_port_number() - local physical port number,
ntb_peer_port_count() - total number of ports NTB device provide (excluding the local one),
ntb_peer_port_number() - physical port number of the peer with passed port index,
ntb_peer_port_idx - port index of the passed physical port number;
-Sergey
> This world view isn't far off from what we have now, though you *may*
> need to adjust your IDT driver and we will have to eventually clean up
> the existing test clients to use the new helper functions.
>
> > Personally I'd prefer the first solution even though it may lead to the
> > "Unsupported TLP" errors and cause a greater code changes. Here is why:
> > 1) the error might be IDT-specific, so we shouldn't limit the API due to
> > one particular hardware peculiarity,
> > 2) port-index table with global indexes implementation shall simplify the IDT
> > NTB hw driver and provide a cleaner NTB API with simpler shared resources
> > utilization code.
>
> > The final decision is after the NTB subsystem maintainers. If they agree with
> > solution #1 I'll send a corresponding patchset on this week, so you can
> > alter this patchset being based on it.
>
> I think what we have right now is close enough and we just have to clean
> up the code and fix things. I don't think we need to do another big
> change to the semantics. I *certainly* don't want to risk breaking
> everything again to do it.
> Logan
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-ntb+unsubscribe@xxxxxxxxxxxxxxxxx
> To post to this group, send email to linux-ntb@xxxxxxxxxxxxxxxxx
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/bd72f24f-5982-0fe7-59df-2fbbfe9f798a%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.