Re: [PATCH IB/core 2/2] IB/cm: Send authentic pkey in REQ msg and check eligibility of the pkeys
From: HÃkon Bugge
Date: Wed May 16 2018 - 02:15:06 EST
> On 15 May 2018, at 21:04, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Tue, May 15, 2018 at 08:11:09PM +0200, HÃkon Bugge wrote:
>>> On 15 May 2018, at 02:38, Hal Rosenstock <hal@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> On 5/14/2018 5:02 PM, Jason Gunthorpe wrote:
>>>> On Thu, May 10, 2018 at 05:16:28PM +0200, HÃkon Bugge wrote:
>>>>
>>>>> We are talking about two things here. The PKey in the BTH and the
>>>>> PKey in the CM REQ payload. They differ.
>>>>>
>>>>> I am out of office, but if my memory serves me correct, the PKey in
>>>>> the BTH in the MAD packet will be the default PKey. Further, we have
>>>>> per IBTA:
>>>>
>>>> This sounds like a Linux bug.
>>>>
>>>> Linux does not do a PR to get a reversible path dedicated to the GMP> so it always uses the data flow path, thus the GMP path paramenters
>>>> and those in the REQ should always exactly match.
>>>>
>>>> Where is Linux setting the BTH.PKey and how did it choose to use the
>>>> default pkey? Lets fix that at least for sure.
>>
>> Linux isnât. The BTH.PKey is inserted by the HCA (hw or fw) coming from the P_Key table (10.9.2 in IBTA), selected by a pkey_index associated with the QP.
>>
>> As per C10-133: Packets sent from the Send Queue of a GSI QP shall
>> attach a P_Key associated with that QP, just as a P_Key is
>> associated with nonmanagement QPs
>
> That language doesn't mean the PKey is forced to the default, it says
> the pkey is programmable.
>
> Linux implemented programmable PKeys for the GSI by using the work
> request, so it deviates from what the spec imagined (which was
> probably one GSI QP per PKey).
>
> See ib_create_send_mad for instance:
>
> mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe;
> mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list;
> mad_send_wr->send_wr.wr.num_sge = 2;
> mad_send_wr->send_wr.wr.opcode = IB_WR_SEND;
> mad_send_wr->send_wr.wr.send_flags = IB_SEND_SIGNALED;
> mad_send_wr->send_wr.remote_qpn = remote_qpn;
> mad_send_wr->send_wr.remote_qkey = IB_QP_SET_QKEY;
> mad_send_wr->send_wr.pkey_index = pkey_index;
>
> The pkey_index associated with the QP1 is ignored:
>
> /*
> * PKey index for QP1 is irrelevant but
> * one is needed for the Reset to Init transition
> */
> attr->qp_state = IB_QPS_INIT;
> attr->pkey_index = pkey_index;
> attr->qkey = (qp->qp_num == 0) ? 0 : IB_QP1_QKEY;
> ret = ib_modify_qp(qp, attr, IB_QP_STATE |
> IB_QP_PKEY_INDEX | IB_QP_QKEY);
>
> Since is is present in every WR.
>
>>>> Basically this means that any pkey in the REQ could randomly be the
>>>> full or limited value, and that in-of-itself has not bearing on the
>>>> connection.
>>>>
>>>> So it is quite wrong to insist that the pkey be limited or full when
>>>> processing the REQ. The end port is expected to match against the
>>>> local table.
>>>
>>> Note that there is thorny issue with shared (physical) port
>>> virtualization. In shared port virtualization, the VF pkey assignment is
>>> a local matter. Only thing SM knows is the physical port pkeys where
>>> both full and limited membership of same partition is possible. It is
>>> conceivable that CM REQ contains limited partition key between 2 limited
>>> VFs and for that a new REJ reason code appears to be needed.
>>
>> +1
>
> This is not a difficult issue.
>
> If the GMP is properly tagged with the right PKey then it will never
> be delivered to the VM if the VM does not have the PKey in the
> table.
Not quite right. For the shared port model, a GMP will (most probably) be accepted by the physical port, due to:
IBTA 3.9.5: QP1 is a member of all of the portâs partitions (i.e., can accept any packet specifying a P_Key contained in the portâs P_Key table).
and
C9-42: If the destination QP is QP1, the BTH:P_Key shall be compared to the set of P_Keys associated with the port on which the packet arrived. If the P_Key matches any of the keys associated with the port, it shall be considered valid.
So, if the GMP is destined to VM1, which is a limited member, but we have another VM2 which is a full member of the same partition, the GMP will pass the HCAâs PKey check.
> It is up to the hypervisor to block GMPs that have Pkeys not in
> the virtual PKey table of the VF.
The packet is received by the HCA and stripped from IB headers, in particular the BTH. How can the "hypervisor" block it when its doesnât have access to the GMPâs BTH.PKey?
> The only time you could need a new REJ code is if the GMP is using a
> PKey different from the REQ - which is a pretty goofy thing to do
> considering this VM case.
Its goofy. In the CX-3 shared port model, the BTH.PKey is the default one and the REQ.PKey is the full one even if the sending VMâs port only is a limited member. This patch series fixes the last issue.
> Remember the SM doesn't know what Pkeys are in the VM, so it is
> basically impossible for the REQ side to reliably select two different
> pkeys and know that they will bothmake it to the VM.
The active side should use the "authentic" PKey in the REQ message. That is the one that would be used in BTH.PKey when communication has been established. This is implemented by this patch series.
Not sure what you mean by "reliably select two different pkeys". The CM REQ message contains one PKey.
> One pkey could be done reliably if it matched ipoib, for instance, but
> two really cannot in the general case.
>
> So again - the bug here is that the GMP is being sent with the wrong
> pkey. Fix that, then see what is left to fix..
See above, not sure how that could be implemented. And if it is solved by the HCA trapping the GMP due to the PKey check, it doesnât help me, as the purpose of the series is to avoid (excessive) PKey traps sent to the SM.
> If I recall there were bugs here in mlx drivers, where the driver sent
> with the wrong Pkey. I think this has actually been fixed now, so
> please check the upstream kernel to be sure the Pkey is not what it is
> supposed to be.
Let me get back to you with some ibdumps here.
HÃkon