[PATCH rdma-next 00/15] Fix locking around cm_id.state in the ib_cm

From: Leon Romanovsky
Date: Tue Mar 10 2020 - 05:25:53 EST


From: Leon Romanovsky <leonro@xxxxxxxxxxxx>

>From Jason:

cm_id.state is a non-atomic value that must always be read and written
under lock, or while the thread has the only pointer to the cm_id.

Critically, during MAD handling the cm_id.state is used to control when
MAD handlers can run, and in turn what data they can touch. Without
locking, an assignment to state can immediately allow concurrent MAD
handlers to execute, potentially creating a mess.

Several of these cases only risk load/store tearing, but create very
confusing code. For instance changing the state from IB_CM_IDLE to
IB_CM_LISTEN doesn't allow any MAD handlers to run in either state, but a
superficial audit would suggest that it is not locked properly.

This loose methodology has allowed two bugs to creep in. After creating an
ID the code did not lock the state transition, apparently mistakenly
assuming that the new ID could not be used concurrently. However, the ID
is immediately placed in the xarray and so a carefully crafted network
sequence could trigger races with the unlocked stores.

The main solution to many of these problems is to use the xarray to create
a two stage add - the first reserves the ID and the second publishes the
pointer. The second stage is either omitted entirely or moved after the
newly created ID is setup.

Where it is trivial to do so other places directly put the state
manipulation under lock, or add an assertion that it is, in fact, under
lock.

This also removes a number of places where the state is being read under
lock, then the lock dropped, reacquired and state tested again.

There remain other issues related to missing locking on cm_id data.

Thanks

------------------------------------------------------------------------
It is based on rdma-next + rdma-rc patch c14dfddbd869
("RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()")

Jason Gunthorpe (15):
RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id()
RDMA/cm: Fix checking for allowed duplicate listens
RDMA/cm: Remove a race freeing timewait_info
RDMA/cm: Make the destroy_id flow more robust
RDMA/cm: Simplify establishing a listen cm_id
RDMA/cm: Read id.state under lock when doing pr_debug()
RDMA/cm: Make it clear that there is no concurrency in
cm_sidr_req_handler()
RDMA/cm: Make it clearer how concurrency works in cm_req_handler()
RDMA/cm: Add missing locking around id.state in cm_dup_req_handler
RDMA/cm: Add some lockdep assertions for cm_id_priv->lock
RDMA/cm: Allow ib_send_cm_dreq() to be done under lock
RDMA/cm: Allow ib_send_cm_drep() to be done under lock
RDMA/cm: Allow ib_send_cm_rej() to be done under lock
RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock
RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy

drivers/infiniband/core/cm.c | 732 ++++++++++++++++++++---------------
1 file changed, 420 insertions(+), 312 deletions(-)

--
2.24.1