[PATCH net v2 0/3] net/smc: Fixes for race in smc link group termination

From: Wen Gu
Date: Thu Jan 13 2022 - 03:36:55 EST


We encountered some crashes recently and they are caused by the
race between the access and free of link/link group in abnormal
smc link group termination. The crashes can be reproduced in
frequent abnormal link group termination, like setting RNICs up/down.

This set of patches tries to fix this by extending the life cycle
of link/link group to ensure that they won't be referred to after
cleared or freed.

v1 -> v2:
- Improve some comments.

- Move codes of waking up lgrs_deleted wait queue from smc_lgr_free()
to __smc_lgr_free().

- Move codes of waking up links_deleted wait queue from smcr_link_clear()
to __smcr_link_clear().

- Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear()
to __smcr_link_clear()

- Move smc_lgr_put() to the end of __smcr_link_clear().

- Call smc_lgr_put() after 'out' tag in smcr_link_init() when link
initialization fails.

- Modify the location where smc connection holds the lgr or link.

before:
* hold lgr in smc_lgr_register_conn().
* hold link in smcr_lgr_conn_assign_link().
after:
* hold both lgr and link in smc_conn_create().

Modify the location to symmetrical with the place where smc connections
put the lgr or link, which is smc_conn_free().

- Initialize conn->freed as zero in smc_conn_create().

Wen Gu (3):
net/smc: Resolve the race between link group access and termination
net/smc: Introduce a new conn->lgr validity check helper
net/smc: Resolve the race between SMC-R link access and clear

net/smc/af_smc.c | 6 ++-
net/smc/smc.h | 1 +
net/smc/smc_cdc.c | 3 +-
net/smc/smc_clc.c | 2 +-
net/smc/smc_core.c | 120 +++++++++++++++++++++++++++++++++++++++++------------
net/smc/smc_core.h | 12 ++++++
net/smc/smc_diag.c | 6 +--
7 files changed, 118 insertions(+), 32 deletions(-)

--
1.8.3.1