Re: [PATCH 2/2] IB/mad: Use IDR for agent IDs

From: Jason Gunthorpe
Date: Mon Jun 11 2018 - 12:19:42 EST

On Mon, Jun 11, 2018 at 09:19:14AM +0300, jackm wrote:
> On Sun, 10 Jun 2018 22:42:03 -0600
> Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:
> > Er, the spec has nothing to do with this. In Linux the TID is made
> > unique because the core code provides 32 bits that are unique and the
> > user provides another 32 bits that are unique. The driver cannot
> > change any of those bits without risking non-uniquenes, which is
> > exactly the bug mlx4 created when it stepped outside its bounds and
> > improperly overrode bits in the TID for its own internal use.
> Actually, the opposite is true here. When SRIOV is active, each VM
> generates its *own* TIDs -- with 32 bits of agent number and 32 bits
> of counter.

And it does it while re-using the LRH of the host, so all VMs and the
host are now forced to share a TID space, yes I know.

> There is a chance that two different VMs can generate the same TID!
> Encoding the slave (VM) number in the packet actually guarantees
> uniqueness here.

Virtualizing the TID in the driver would be fine, but it must
virtualize all the TIDs (even those generated by the HOST).

Just blindly assuming the host doesn't generate TID's that overlap
with the virtualization process is a bug.

> There is nothing wrong with modifying the TID in a reversible way in
> order to: a. guarantee uniqueness b. identify the VM which should
> receive the response packet

Sure, as long as *all* TID's sharing a LRH are vitalized like this.

> The problem was created when the agent-id numbers started to use the
> most-significant byte (thus making the MSB slave-id addition
> impossible).

It hasn't always been this way? What commit?