RE: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable

From: Parav Pandit
Date: Mon Mar 04 2019 - 01:27:36 EST




> -----Original Message-----
> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Sent: Saturday, February 23, 2019 2:50 AM
> To: Doug Ledford <dledford@xxxxxxxxxx>
> Cc: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>; Jason Gunthorpe
> <jgg@xxxxxxxx>; HÃkon Bugge <haakon.bugge@xxxxxxxxxx>; Parav Pandit
> <parav@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] RDMA/cma: Make CM response timeout and # CM
> retries configurable
>
> On Fri, Feb 22, 2019 at 12:51:55PM -0500, Doug Ledford wrote:
> >
> >
> > > On Feb 22, 2019, at 12:14 PM, Steve Wise
> <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > >
> > > On 2/22/2019 10:36 AM, Jason Gunthorpe wrote:
> > >> On Sun, Feb 17, 2019 at 06:09:09PM +0100, HÃkon Bugge wrote:
> > >>> During certain workloads, the default CM response timeout is too
> > >>> short, leading to excessive retries. Hence, make it configurable
> > >>> through sysctl. While at it, also make number of CM retries
> > >>> configurable.
> > >>>
> > >>> The defaults are not changed.
> > >>>
> > >>> Signed-off-by: HÃkon Bugge <haakon.bugge@xxxxxxxxxx>
> > >>> drivers/infiniband/core/cma.c | 51
> > >>> ++++++++++++++++++++++++++++++-----
> > >>> 1 file changed, 44 insertions(+), 7 deletions(-)
> > >>>
> > >>> diff --git a/drivers/infiniband/core/cma.c
> > >>> b/drivers/infiniband/core/cma.c index c43512752b8a..ce99e1cd1029
> > >>> 100644
> > >>> +++ b/drivers/infiniband/core/cma.c
> > >>> @@ -43,6 +43,7 @@
> > >>> #include <linux/inetdevice.h>
> > >>> #include <linux/slab.h>
> > >>> #include <linux/module.h>
> > >>> +#include <linux/sysctl.h>
> > >>> #include <net/route.h>
> > >>>
> > >>> #include <net/net_namespace.h>
> > >>> @@ -68,13 +69,46 @@ MODULE_AUTHOR("Sean Hefty");
> > >>> MODULE_DESCRIPTION("Generic RDMA CM Agent");
> MODULE_LICENSE("Dual
> > >>> BSD/GPL");
> > >>>
> > >>> -#define CMA_CM_RESPONSE_TIMEOUT 20 #define
> > >>> CMA_QUERY_CLASSPORT_INFO_TIMEOUT 3000 -#define
> CMA_MAX_CM_RETRIES
> > >>> 15 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24)
> #define
> > >>> CMA_IBOE_PACKET_LIFETIME 18 #define
> CMA_PREFERRED_ROCE_GID_TYPE
> > >>> IB_GID_TYPE_ROCE_UDP_ENCAP
> > >>>
> > >>> +#define CMA_DFLT_CM_RESPONSE_TIMEOUT 20 static int
> > >>> +cma_cm_response_timeout = CMA_DFLT_CM_RESPONSE_TIMEOUT;
> static
> > >>> +int cma_cm_response_timeout_min = 8; static int
> > >>> +cma_cm_response_timeout_max = 31; #undef
> > >>> +CMA_DFLT_CM_RESPONSE_TIMEOUT
> > >>> +
> > >>> +#define CMA_DFLT_MAX_CM_RETRIES 15 static int
> cma_max_cm_retries
> > >>> += CMA_DFLT_MAX_CM_RETRIES; static int cma_max_cm_retries_min
> = 1;
> > >>> +static int cma_max_cm_retries_max = 100; #undef
> > >>> +CMA_DFLT_MAX_CM_RETRIES
> > >>> +
> > >>> +static struct ctl_table_header *cma_ctl_table_hdr; static struct
> > >>> +ctl_table cma_ctl_table[] = {
> > >>> + {
> > >>> + .procname = "cma_cm_response_timeout",
> > >>> + .data = &cma_cm_response_timeout,
> > >>> + .maxlen = sizeof(cma_cm_response_timeout),
> > >>> + .mode = 0644,
> > >>> + .proc_handler = proc_dointvec_minmax,
> > >>> + .extra1 = &cma_cm_response_timeout_min,
> > >>> + .extra2 = &cma_cm_response_timeout_max,
> > >>> + },
> > >>> + {
> > >>> + .procname = "cma_max_cm_retries",
> > >>> + .data = &cma_max_cm_retries,
> > >>> + .maxlen = sizeof(cma_max_cm_retries),
> > >>> + .mode = 0644,
> > >>> + .proc_handler = proc_dointvec_minmax,
> > >>> + .extra1 = &cma_max_cm_retries_min,
> > >>> + .extra2 = &cma_max_cm_retries_max,
> > >>> + },
> > >>> + { }
> > >>> +};
> > >> Is sysctl the right approach here? Should it be rdma tool instead?
> > >>
> > >> Jason
> > >
> > > There are other rdma sysctls currently: net.rdma_ucm.max_backlog
> > > and net.iw_cm.default_backlog. The core network stack seems to use
> > > sysctl and not ip tool to set basically globals.
> > >
> > > To use rdma tool, we'd have to have some concept of a "module"
> > > object, I guess. IE there's dev, link, and resource rdma tool objects
> currently.
> > > But these cma timeout settings are really not per dev, link, nor a
> > > resource. Maybe we have just a "core" object: rdma core set
> > > cma_max_cm_retries min 8 max 30.
> >
> > I donât know, I think you make a fairly good argument for leaving it as a
> sysctl. We have infrastructure in place for admins to set persistent sysctl
> settings. The per device/link settings need something different because link
> names and such can change. Since these are globals, Iâd leave them where
> they are.
> >
>
> I have patches from Parav which extend rdmatool to set global to whole
> stack parameters, something like "rdma system ...", so the option to set
> through rdmatool global parameters for modules e.g. "rdma system cma set
> ..."
> exists. But I'm not sure if it is right thing to do.
>

I think we should use rdma_nl_register(RDMA_NL_RDMA_CM, cma_cb_table) which was removed as part of ID stats removal.
Because of below reasons.
1. rdma netlink command auto loads the module
2. we don't need to write any extra code to do register_net_sysctl () in each netns.
Caller's skb's netns will read/write value of response_timeout in 'struct cma_pernet'.
3. last time sysctl added in ipv6 was in 2017 in net/ipv6/addrconf.c, however ipv4 was done in 2018.

Currently rdma_cm/rdma_ucma has configfs, sysctl.
We are adding netlink sys params to ib_core.

We already have 3 clients and infra built using rdma_nl_register() netlink , so hooking up to netlink will provide unified way to set rdma params.
Let's just use netlink for any new params unless it is not doable.