Re: Re: Re: Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64

From: Bernard Metzler
Date: Fri Jul 12 2019 - 13:40:58 EST


-----"Jason Gunthorpe" <jgg@xxxxxxxx> wrote: -----

>To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>
>From: "Jason Gunthorpe" <jgg@xxxxxxxx>
>Date: 07/12/2019 05:33PM
>Cc: "Arnd Bergmann" <arnd@xxxxxxxx>, "Doug Ledford"
><dledford@xxxxxxxxxx>, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx>,
>linux-rdma@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>Subject: [EXTERNAL] Re: Re: Re: Re: [PATCH] rdma/siw: avoid
>smp_store_mb() on a u64
>
>On Fri, Jul 12, 2019 at 03:24:09PM +0000, Bernard Metzler wrote:
>>
>> >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>
>> >From: "Jason Gunthorpe" <jgg@xxxxxxxx>
>> >Date: 07/12/2019 04:43PM
>> >Cc: "Arnd Bergmann" <arnd@xxxxxxxx>, "Doug Ledford"
>> ><dledford@xxxxxxxxxx>, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx>,
>> >linux-rdma@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>> >Subject: [EXTERNAL] Re: Re: Re: [PATCH] rdma/siw: avoid
>> >smp_store_mb() on a u64
>> >
>> >On Fri, Jul 12, 2019 at 02:35:50PM +0000, Bernard Metzler wrote:
>> >
>> >> >This looks wrong to me.. a userspace notification re-arm cannot
>be
>> >> >lost, so have a split READ/TEST/WRITE sequence can't possibly
>> >work?
>> >> >
>> >> >I'd expect an atomic test and clear here?
>> >>
>> >> We cannot avoid the case that the application re-arms the
>> >> CQ only after a CQE got placed. That is why folks are polling
>the
>> >> CQ once after re-arming it - to make sure they do not miss the
>> >> very last and single CQE which would have produced a CQ event.
>> >
>> >That is different, that is re-arm happing after a CQE placement
>and
>> >this can't be fixed.
>> >
>> >What I said is that a re-arm from userspace cannot be lost. So you
>> >can't blindly clear the arm flag with the WRITE_ONCE. It might be
>OK
>> >beacuse of the if, but...
>> >
>> >It is just goofy to write it without a 'test and clear' atomic. If
>> >the
>> >writer side consumes the notify it should always be done
>atomically.
>> >
>> Hmmm, I don't yet get why we should test and clear atomically, if
>we
>> clear anyway - is it because we want to avoid clearing a re-arm
>which
>> happens just after testing and before clearing?
>
>It is just clearer as to the intent..
>
>Are you trying to optimize away an atomic or something? That might
>work better as a dual counter scheme.
>
>> Another complication -- test_and_set_bit() operates on a single
>> bit, but we have to test two bits, and reset both, if one is
>> set.
>
>Why are two bits needed to represent armed and !armed?
>
>Jason

It is because there are two levels a CQ can be armed:

#include <infiniband/verbs.h>

int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only);

If we kick the CQ handler, we have to clear the whole
thing. The user later again decides how he wants to get it
re-armed...SOLICITED completions only, or ALL signaled.

Best,
Bernard.

>
>