Re: [bug report] RDMA/iwpm: reentrant iwpm hello message
From: Leon Romanovsky
Date: Wed Jan 08 2025 - 10:14:31 EST
On Mon, Dec 30, 2024 at 08:28:28PM +0200, Leon Romanovsky wrote:
> On Wed, Dec 25, 2024 at 09:58:35AM +0800, Lin Ma wrote:
> >
> > >
> > > Do you have reproducer for that?
> > >
> >
> > Yep, I attached the PoC code, please enable CONFIG_INFINIBAND
> > for testing.
>
> Thanks a lot for the repro. I wonder why iWARP folks never complained
> about it, Anyway I have local fix, but need to test it before sending,
> will do after New Year holidays.
I was wrong, there is no simple fix for this issue.
The root cause for these lockdep warnings is nested locking in iWARP.
IWCM uses dump callbacks as doit ones. See the following FIXME line:
184 /* FIXME: Convert IWCM to properly handle doit callbacks */
185 if ((nlh->nlmsg_flags & NLM_F_DUMP) || index == RDMA_NL_IWCM) {
186 struct netlink_dump_control c = {
187 .dump = cb_table[op].dump,
188 };
189 if (c.dump)
190 err = netlink_dump_start(skb->sk, skb, nlh, &c);
In our case,
cb_table[op].dump ->
iwpm_hello_cb ->
iwpm_send_hello ->
rdma_nl_unicast() <---- this shouldn't be in dump callbacks.
The right and only viable solution is to convert all IWCM to use .doit callbacks.
Do any iWARP developer/user volunteer for such conversion?
Thanks
>
> Thanks again.
>
> >
> > Thanks
> > By the way, Merry Christmas~
> >
>
> > // gcc poc.c -static -o poc.elf -lmnl
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <stdint.h>
> > #include <string.h>
> > #include <stdbool.h>
> >
> > #include <libmnl/libmnl.h>
> >
> > #define PAGE_SIZE 0x1000
> > #define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 10)
> > #define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1))
> > #define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
> > #define RDMA_NL_IWCM (2)
> > #define IWPM_NLA_HELLO_ABI_VERSION (1)
> >
> > enum
> > {
> > RDMA_NL_IWPM_REG_PID = 0,
> > RDMA_NL_IWPM_ADD_MAPPING,
> > RDMA_NL_IWPM_QUERY_MAPPING,
> > RDMA_NL_IWPM_REMOVE_MAPPING,
> > RDMA_NL_IWPM_REMOTE_INFO,
> > RDMA_NL_IWPM_HANDLE_ERR,
> > RDMA_NL_IWPM_MAPINFO,
> > RDMA_NL_IWPM_MAPINFO_NUM,
> > RDMA_NL_IWPM_HELLO,
> > RDMA_NL_IWPM_NUM_OPS
> > };
> >
> > int main(int argc, char const *argv[])
> > {
> > struct mnl_socket *sock;
> > struct nlmsghdr *nlh;
> > char buf[PAGE_SIZE];
> > int err;
> >
> > sock = mnl_socket_open(NETLINK_RDMA);
> > if (sock == NULL)
> > {
> > perror("mnl_socket_open");
> > exit(-1);
> > }
> >
> > nlh = mnl_nlmsg_put_header(buf);
> > nlh->nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_IWCM, RDMA_NL_IWPM_HELLO);
> > nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> > nlh->nlmsg_seq = 1;
> > nlh->nlmsg_pid = 0;
> >
> > // static const struct nla_policy hello_policy[IWPM_NLA_HELLO_MAX] = {
> > // [IWPM_NLA_HELLO_ABI_VERSION] = { .type = NLA_U16 }
> > // };
> > mnl_attr_put_u16(nlh, IWPM_NLA_HELLO_ABI_VERSION, 3);
> >
> > err = mnl_socket_sendto(sock, buf, nlh->nlmsg_len);
> > if (err < 0)
> > {
> > perror("mnl_socket_sendto");
> > exit(-1);
> > }
> > return 0;
> > }
>
>