RE: [PATCH v3 4/6] RDMA/cm: Brute force GFP_NOIO

From: Naveen Mamindlapalli
Date: Thu May 23 2024 - 02:46:08 EST



> -----Original Message-----
> From: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
> Sent: Wednesday, May 22, 2024 7:25 PM
> To: linux-rdma@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx; rds-devel@xxxxxxxxxxxxxx
> Cc: Jason Gunthorpe <jgg@xxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>;
> Saeed Mahameed <saeedm@xxxxxxxxxx>; Tariq Toukan <tariqt@xxxxxxxxxx>;
> David S . Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet
> <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni
> <pabeni@xxxxxxxxxx>; Tejun Heo <tj@xxxxxxxxxx>; Lai Jiangshan
> <jiangshanlai@xxxxxxxxx>; Allison Henderson <allison.henderson@xxxxxxxxxx>;
> Manjunath Patil <manjunath.b.patil@xxxxxxxxxx>; Mark Zhang
> <markzhang@xxxxxxxxxx>; Håkon Bugge <haakon.bugge@xxxxxxxxxx>; Chuck
> Lever <chuck.lever@xxxxxxxxxx>; Shiraz Saleem <shiraz.saleem@xxxxxxxxx>;
> Yang Li <yang.lee@xxxxxxxxxxxxxxxxx>
> Subject: [PATCH v3 4/6] RDMA/cm: Brute force GFP_NOIO
>
> In ib_cm_init(), we call memalloc_noio_{save,restore} in a parenthetic fashion
> when enabled by the module parameter force_noio.
>
> This in order to conditionally enable ib_cm to work aligned with block I/O devices.
> Any work queued later on work-queues created during module initialization will
> inherit the PF_MEMALLOC_{NOIO,NOFS} flag(s), due to commit ("workqueue:
> Inherit NOIO and NOFS alloc flags").
>
> We do this in order to enable ULPs using the RDMA stack to be used as a
> network block I/O device. This to support a filesystem on top of a raw block
> device which uses said ULP(s) and the RDMA stack as the network transport
> layer.
>
> Under intense memory pressure, we get memory reclaims. Assume the filesystem
> reclaims memory, goes to the raw block device, which calls into the ULP in
> question, which calls the RDMA stack. Now, if regular GFP_KERNEL allocations
> in ULP or the RDMA stack require reclaims to be fulfilled, we end up in a circular
> dependency.
>
> We break this circular dependency by:
>
> 1. Force all allocations in the ULP and the relevant RDMA stack to use
> GFP_NOIO, by means of a parenthetic use of
> memalloc_noio_{save,restore} on all relevant entry points.
>
> 2. Make sure work-queues inherits current->flags
> wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the
> work-queue inherits the same flag(s).
>
> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
> ---
> drivers/infiniband/core/cm.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index
> 07fb8d3c037f0..767eec38eb57d 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -22,6 +22,7 @@
> #include <linux/workqueue.h>
> #include <linux/kdev_t.h>
> #include <linux/etherdevice.h>
> +#include <linux/sched/mm.h>
>
> #include <rdma/ib_cache.h>
> #include <rdma/ib_cm.h>
> @@ -35,6 +36,11 @@ MODULE_DESCRIPTION("InfiniBand CM");
> MODULE_LICENSE("Dual BSD/GPL");
>
> #define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
> +
> +static bool cm_force_noio;
> +module_param_named(force_noio, cm_force_noio, bool, 0444);
> +MODULE_PARM_DESC(force_noio, "Force the use of GFP_NOIO (Y/N)");
> +
> static const char * const ibcm_rej_reason_strs[] = {
> [IB_CM_REJ_NO_QP] = "no QP",
> [IB_CM_REJ_NO_EEC] = "no EEC",
> @@ -4504,6 +4510,10 @@ static void cm_remove_one(struct ib_device
> *ib_device, void *client_data) static int __init ib_cm_init(void) {
> int ret;
> + unsigned int noio_flags;

minor: please follow reverse xmas tree order

> +
> + if (cm_force_noio)
> + noio_flags = memalloc_noio_save();
>
> INIT_LIST_HEAD(&cm.device_list);
> rwlock_init(&cm.device_lock);
> @@ -4527,10 +4537,13 @@ static int __init ib_cm_init(void)
> if (ret)
> goto error3;
>
> - return 0;
> + goto error2;
> error3:
> destroy_workqueue(cm.wq);
> error2:
> + if (cm_force_noio)
> + memalloc_noio_restore(noio_flags);
> +
> return ret;
> }
>
> --
> 2.31.1
>