RE: New NTB API Issue

From: Allen Hubbe
Date: Fri Jun 23 2017 - 15:07:47 EST

From: Logan Gunthorpe
> On 23/06/17 07:18 AM, Allen Hubbe wrote:
> > By "remote" do you mean the source or destination of a write?
> Look at how these calls are used in ntb_transport and ntb_perf:
> They both call ntb_peer_mw_get_addr to get the size of the BAR. The size
> is sent via spads to the other side. The other side then uses
> ntb_mw_get_align and applies align_size to the received size.
> > Yes, clients should transfer the address and size information to the peer.
> But then they also need to technically transfer the alignment
> information as well. Which neither of the clients do.

The client's haven't been fully ported to the multi-port api yet. They were only minimally changed to call the new api, but so far other than that they have only been made to work as they had before.

> > Maybe this is the confusion. None of these api calls are to reach across to the peer port, as to
> get the size of the peer's bar. They are to get information from the local port, or to configure the
> local port.
> I like the rule that these api calls are not to reach across the port.
> But then API looks very wrong. Why are we calling one peer_mw_get addr
> and the other mw_get_align? And why does mw_get_align have a peer index?

I regret that the term "peer" is used to distinguish the mw api. Better names perhaps should be ntb_outbound_mw_foo, ntb_inbound_mw_foo; or ntb_src_mw_foo, ntb_dest_mw_foo. I like outbound/inbound, although the names are longer, maybe out/in would be ok.

> And why does mw_get_align have a peer index?

Good question. Serge?

For that matter, why do we not also have peer mw idx in the set of parameters. Working through the example below, it looks like we are lacking a way to say Outbound MW1 on A corresponds with Inbound MW0 on B. It looks like we can only indicate that Port A (not which Outbound MW of Port A) corresponds with Inbound MW0 on B.

> > Some snippets of code would help me understand your interpretation of the api semantics more
> exactly.
> I'm not sure the code to best show this in code, but let me try
> describing an example situation:
> Lets say we have the following mws on each side (this is something that
> is possible with Switchtec hardware):
> Host A BARs:
> mwA0: 2MB size, aligned to 4k, size aligned to 4k
> mwA1: 4MB size, aligned to 4k, size aligned to 4k
> mwA2: 64k size, aligned to 64k, size aligned to 64k
> Host B BARs:
> mwB0: 2MB size, aligned to 4k, size aligned to 4k
> mwB1: 64k size, aligned to 64k, size aligned to 64k

If those are BARs, that corresponds to "outbound", writing something to the BAR at mwA0.

A more complete picture might be:

Host A BARs (aka "outbound" or "peer" memory windows):
peer_mwA0: resource at 0xA00000000 - 0xA00200000 (2MB)
peer_mwA1: resource at 0xA10000000 - 0xA10400000 (4MB)
peer_mwA2: resource at 0xA20000000 - 0xa20010000 (64k)

Host A MWs (aka "inbound" memory windows):
mwA0: 64k max size, aligned to 64k, size aligned to 64k
mwA1: 2MB max size, aligned to 4k, size aligned to 4k

Host A sees Host B on port index 1

Host B BARs (aka "outbound" or "peer" memory windows):
peer_mwB0: resource at 0xB00000000 - 0xB00200000 (2MB)
peer_mwB1: resource at 0xB10000000 - 0xB10010000 (64k)

Host B MWs (aka "inbound" memory windows):
mwB0: 1MB size, aligned to 4k, size aligned to 4k
mwB1: 2MB size, aligned to 4k, size aligned to 4k

Host B sees Host A on port index 4

Outbound memory (aka "peer mw") windows come with a pci resource. We can get the size of the resource, it's physical address, and set up outbound translation if the hardware has that (IDT).

Inbound memory windows (aka "mw") are only used to set up inbound translation, if the hardware has that (Intel, AMD).

To set up end-to-end memory window so that A can write to B, let's use peer_mwA1 and mwB0.

A: ntb_peer_mw_get_addr(peer_mwA1) -> base 0xA10000000, size 4MB
B: ntb_mw_get_align(port4**, mwB0) -> aligned 4k, aligned 4k, max size 1MB
** Serge: do we need port info here, why?

Side A has a resource size of 4MB, but B only supports inbound translation up to 1MB. Side A can only use the first quarter of the 4MB resource.

Side B needs to allocate memory aligned to 4k (the dma address must be aligned to 4k after dma mapping), and a multiple of 4k in size. B may need to set inbound translation so that incoming writes go into this memory. A may also need to set outbound translation.

A: ntb_peer_mw_set_trans(port1**, peer_mwA1, dma_mem_addr, dma_mem_size)
B: ntb_mw_set_trans(port4**, mwB0, dma_mem_addr, dma_mem_size)
** Serge: do we also need the opposing side MW index here?

** Logan: would those changes to the api suit your needs?

> So on Host A: peer_mw_get_addr(idx=1) should return size=4M (from mwA1),
> but it's not clear what mw_get_align(widx=1) should do. I see two
> possibilities:
> 1) Given that it has the opposite sense of peer_mw_get_addr (ie. there
> is no peer in it's name) and that this new api also has a peer index, it
> should return align_size=64k (from mwB1). However, in order to do this,
> Host B must be fully configured (technically the link doesn't have to be
> fully up, but having a link up is the only way for a client to tell if
> Host B is configured or not).
> 2) Given your assertion that these APIs should never reach across the
> link, then one could say it should return align_size=4k. However, in
> this situation the name is really confusing. And the fact that it has a
> pidx argument makes no sense. And the way ntb_transport and ntb_perf use
> it is wrong because they will end up masking the 64K size of mwB1 with
> the 4k align_size from mwA1.
> Does that make more sense?
> Thanks,
> Logan