Re: [PATCH v2 07/12] NTB: Introduce functions to calculate multi-port resource index

From: Serge Semin
Date: Tue Mar 05 2019 - 20:31:36 EST


On Wed, Feb 13, 2019 at 10:54:49AM -0700, Logan Gunthorpe wrote:

Hi

> When using multi-ports each port uses resources (dbs, msgs, mws, etc)
> on every other port. Creating a mapping for these resources such that
> each port has a corresponding resource on every other port is a bit
> tricky.
>
> Introduce the ntb_peer_resource_idx() function for this purpose.
> It returns the peer resource number that will correspond with the
> local peer index on the remote peer.
>
> Also, introduce ntb_peer_highest_mw_idx() which will use
> ntb_peer_resource_idx() but return the MW index starting with the
> highest index and working down.
>
> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
> Cc: Jon Mason <jdmason@xxxxxxxx>
> Cc: Dave Jiang <dave.jiang@xxxxxxxxx>
> Cc: Allen Hubbe <allenbh@xxxxxxxxx>
> ---
> include/linux/ntb.h | 70 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 70 insertions(+)
>
> diff --git a/include/linux/ntb.h b/include/linux/ntb.h
> index 181d16601dd9..f5c69d853489 100644
> --- a/include/linux/ntb.h
> +++ b/include/linux/ntb.h
> @@ -1502,4 +1502,74 @@ static inline int ntb_peer_msg_write(struct ntb_dev *ntb, int pidx, int midx,
> return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
> }
>
> +/**
> + * ntb_peer_resource_idx() - get a resource index for a given peer idx
> + * @ntb: NTB device context.
> + * @pidx: Peer port index.
> + *
> + * When constructing a graph of peers, each remote peer must use a different
> + * resource index (mw, doorbell, etc) to communicate with each other
> + * peer.
> + *
> + * In a two peer system, this function should always return 0 such that
> + * resource 0 points to the remote peer on both ports.
> + *
> + * In a 5 peer system, this function will return the following matrix
> + *
> + * pidx \ port 0 1 2 3 4
> + * 0 0 0 1 2 3
> + * 1 0 1 2 3 4
> + * 2 0 1 2 3 4
> + * 3 0 1 2 3 4
> + *

This table is too simplified to represent a generic case of port-index
mapping table. In particular the IDT PCIe switch got it ports numbered
with uneven integers like: 0 2 4 6 8 12 16 20 or 0 8 16, and so on.
Moreover some of the ports might be disabled or may have NTB functions
deactivated, in which case these ports shouldn't be considered by NTB subsystem
at all. Basically we may have any increasing subset of that port
numbers depending on the current IDT PCIe-switch ports setup.

What I am trying to say by this is that if in real life the NTB MSI library
or some other library used that matrix to calculate the MW index, then most
likely it would either consume nearly all the MWs leaving holes in the space
of MWs or just run out of them since depending on the configuration there might
be from 0 to 24 MWs enabled in a IDT NTB function. In order to solve the problem
we either need the ntb_peer_resource_idx() method somehow fixed so it would use
the pidx instead of the pure port number or we should properly alter the current
NTB API port-index table implementation, so the ntb_pingpong, NTB MSI library and
others wouldn't need to a have some other table to distribute the NTB
resources.

A short preamble of why we introduced the ports-index NTB API. Since each
NTB MW and NTB message registers can be setup to be related with any NTB
device port, we needed to have the port attribute accepted by NTB API
functions. But it turned out the IDT PCIe switch ports are unevenly numbered,
which may cause difficulties in using their numbers for bulk configurations.
So in order to simplify a client code utilizing the multi-ports NTB API (for
example for simpler looping around the device ports), we decided to create the
ports-index table in the NTB API internals. The table is defined by a set of
ntb_port_*()/ntb_peer_port_*() functions. So now all the NTB API methods accept
a port index instead of an irregular port number.

Here is the port-index table currently defined for IDT 89HPES32NT8 PCIe-switch
with all 8 ports having NTB functions enabled:

peer port \ local port 0 2 4 6 8 12 16 20
0 x 0 0 0 0 0 0 0
2 0 x 1 1 1 1 1 1
4 1 1 x 2 2 2 2 2
6 2 2 2 x 3 3 3 3
8 3 3 3 3 x 4 4 4
12 4 4 4 4 4 x 5 5
16 5 5 5 5 5 5 x 6
20 6 6 6 6 6 6 6 x

As you can see currently each NTB device side's got its own port-index
mapping, so each port can access another ports, but not itself.
I implemented it this way intentionally for two reasons:
1) I don't think anyone ever would need to have MW or Message registers
setup pointing to the local port.
2) IDT PCIe switch will report "Unsupported TLP" error in case if there is
an attempt to touch a MW frame initialized with port number matching the local
port number. (Internally IDT PCIe-switch is working with port partitions.
But I didn't want to complicate the API description, so we just utilize the
port numbers naming, which is essentially the same as partitions for NTB.)

The biggest drawback of the selected table representation is that the
port-index mapping is unique for each local port, which makes the
indexes unsuitable to distribute shared resources like outbound
MWs, Doorbells and Scratchpads (if one ever implemented for multi-port
NTB devices). For instance each port can access a port with index 0,
which for local port 0 is port 2, while for the rest of the local ports
it's port 0, and so on.

This drawback already caused complications in the ntb_pingpong driver,
where after your patchset acceptance we use port numbers to distribute
the doorbell bits. The same problem is popping up here, but this time
it can't be solved by the port numbers utilization due to the limited number
of MWs available on IDT.

As I already said there can be two ways to get rid of the complication.
One of them is to redefine the port-index table, so to map the port numbers
to global indexes instead of the local ones, like this:

peer port \ local port 0 2 4 6 8 12 16 20
0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1
4 2 2 2 2 2 2 2 2
6 3 3 3 3 3 3 3 3
8 4 4 4 4 4 4 4 4
12 5 5 5 5 5 5 5 5
16 6 6 6 6 6 6 6 6
20 7 7 7 7 7 7 7 7

By doing so we'll be able to use the port indexes to distribute the
shared resources like MWs, Doorbells and Scratchpads, but we may get
the problem with "Unsupported TLP" error I described before for IDT
devices, if a NTB API client code tries to define a MW pointing to the
local port on IDT NTB hardware. I don't know whether the same problem
exists for Switchtec multi-ports NTB devices, but this approach shall
definitely lead to cleaner and easier NTB API as well as simplify the
IDT NTB hw driver. We also won't need to have any additional tables like
ntb_peer_resource_idx().

(the second solution is described further in the next comment)

> + * For example, if this function is used to program peer's memory
> + * windows, port 0 will program MW 0 on all it's peers to point to itself.
> + * port 1 will program MW 0 in port 0 to point to itself and MW 1 on all
> + * other ports. etc.
> + *
> + * For the legacy two host case, ntb_port_number() and ntb_peer_port_number()
> + * both return zero and therefore this function will always return zero.
> + * So MW 0 on each host would be programmed to point to the other host.
> + *
> + * Return: the resource index to use for that peer.
> + */
> +static inline int ntb_peer_resource_idx(struct ntb_dev *ntb, int pidx)
> +{
> + int local_port, peer_port;
> +
> + if (pidx >= ntb_peer_port_count(ntb))
> + return -EINVAL;
> +
> + local_port = ntb_port_number(ntb);
> + peer_port = ntb_peer_port_number(ntb, pidx);
> +
> + if (peer_port < local_port)
> + return local_port - 1;
> + else
> + return local_port;
> +}
> +

Instead of redefining the port-index table we can just fix the
ntb_peer_resource_idx() method, so it would return a global port index
instead of some number based on the port number. It can be done just by
the next modification:

+ if (peer_port <= local_port)
+ return pidx;
+ else
+ return pidx + 1;


Personally I'd prefer the first solution even though it may lead to the
"Unsupported TLP" errors and cause a greater code changes. Here is why:
1) the error might be IDT-specific, so we shouldn't limit the API due to
one particular hardware peculiarity,
2) port-index table with global indexes implementation shall simplify the IDT
NTB hw driver and provide a cleaner NTB API with simpler shared resources
utilization code.

The final decision is after the NTB subsystem maintainers. If they agree with
solution #1 I'll send a corresponding patchset on this week, so you can
alter this patchset being based on it.

> +/**
> + * ntb_peer_highest_mw_idx() - get a memory window index for a given peer idx
> + * using the highest index memory windows first
> + *
> + * @ntb: NTB device context.
> + * @pidx: Peer port index.
> + *
> + * Like ntb_peer_resource_idx(), except it returns indexes starting with
> + * last memory window index.
> + *
> + * Return: the resource index to use for that peer.
> + */
> +static inline int ntb_peer_highest_mw_idx(struct ntb_dev *ntb, int pidx)
> +{
> + int ret;
> +
> + ret = ntb_peer_resource_idx(ntb, pidx);
> + if (ret < 0)
> + return ret;
> +
> + return ntb_mw_count(ntb, pidx) - ret - 1;
> +}
> +
> #endif
> --
> 2.19.0
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-ntb+unsubscribe@xxxxxxxxxxxxxxxxx
> To post to this group, send email to linux-ntb@xxxxxxxxxxxxxxxxx
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/20190213175454.7506-8-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

-Sergey