Re: [PATCH RFC 00/12] RDMA: Support link status events dispatching in ib_core

From: Junxian Huang
Date: Wed Dec 25 2024 - 01:13:25 EST




On 2024/12/24 21:38, Leon Romanovsky wrote:
> On Tue, Dec 24, 2024 at 08:05:26PM +0800, Junxian Huang wrote:
>>
>>
>> On 2024/12/24 18:32, Leon Romanovsky wrote:
>>> On Fri, Nov 22, 2024 at 06:52:56PM +0800, Junxian Huang wrote:
>>>> This series is to integrate a common link status event handler in
>>>> ib_core as this functionality is needed by most drivers and
>>>> implemented in very similar patterns. This is not a new issue but
>>>> a restart of the previous work of our colleagues from several years
>>>> ago, please see [1] and [2].
>>>>
>>>> [1]: https://lore.kernel.org/linux-rdma/1570184954-21384-1-git-send-email-liweihang@xxxxxxxxxxxxx/
>>>> [2]: https://lore.kernel.org/linux-rdma/20200204082408.18728-1-liweihang@xxxxxxxxxx/
>>>>
>>>> With this series, ib_core can handle netdev events of link status,
>>>> i.e. NETDEV_UP, NETDEV_DOWN and NETDEV_CHANGE, and dispatch ib port
>>>> events to ULPs instead of drivers. However some drivers currently
>>>> have some private processing in their handler, rather than simply
>>>> dispatching events. For these drivers, this series provides a new
>>>> ops report_port_event(). If this ops is set, ib_core will call it
>>>> and the events will still be handled in the driver.
>>>>
>>>> Events of LAG devices are also not handled in ib_core as currently
>>>> there is no way to obtain ibdev from upper netdev in ib_core. This
>>>> can be a TODO work after the core have more support for LAG. For
>>>> now mlx5 is the only driver that supports RoCE LAG, and the events
>>>> handling of mlx5 RoCE LAG will remain in mlx5 driver.
>>>>
>>>> In this series:
>>>>
>>>> Patch #1 adds a new helper to query the port num of a netdev
>>>> associated with an ibdev. This is used in the following patch.
>>>>
>>>> Patch #2 adds support for link status events dispatching in ib_core.
>>>>
>>>> Patch #3-#7 removes link status event handler in several drivers.
>>>> The port state setting in erdma, rxe and siw are replaced with
>>>> ib_get_curr_port_state(), so their handler can be totally removed.
>>>>
>>>> Patch #8-#10 add support for report_port_event() ops in usnic, mlx4
>>>> and pvrdma as their current handler cannot be perfectly replaced by
>>>> the ib_core handler in patch #2.
>>>>
>>>> Patch #11 adds a check in mlx5 that only events of RoCE LAG will be
>>>> handled in mlx5 driver.
>>>>
>>>> Patch #12 adds a fast path for link-down events dispatching in hns by
>>>> getting notified from hns3 nic driver directly.
>>>>
>>>> Yuyu Li (12):
>>>> RDMA/core: Add ib_query_netdev_port() to query netdev port by IB
>>>> device.
>>>> RDMA/core: Support link status events dispatching
>>>> RDMA/bnxt_re: Remove deliver net device event
>>>> RDMA/erdma: Remove deliver net device event
>>>> RDMA/irdma: Remove deliver net device event
>>>> RDMA/rxe: Remove deliver net device event
>>>> RDMA/siw: Remove deliver net device event
>>>> RDMA/usnic: Support report_port_event() ops
>>>> RDMA/mlx4: Support report_port_event() ops
>>>> RDMA/pvrdma: Support report_port_event() ops
>>>> RDMA/mlx5: Handle link status event only for LAG device
>>>> RDMA/hns: Support fast path for link-down events dispatching
>>>
>>> I took the series as it is good thing to remove code duplication
>>> and we waited enough.
>>>
>>
>> Thanks Leon.
>>
>> The kernel test robot has reported one warning and one error for
>> this series:
>>
>> https://lore.kernel.org/oe-kbuild-all/202411251625.VrcLuTRx-lkp@xxxxxxxxx/
>> https://lore.kernel.org/oe-kbuild-all/202411251727.RFxtcpiI-lkp@xxxxxxxxx/
>>
>> I was planning to fix them when I could send the formal patches,
>> but since you have applied these RFC patches,could you please
>> fix them on your wip branch, or should I send separate patches
>> to fix them?
>
> This is how I fixed it. Is it ok?
>
> diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
> index 4286fd4a9324..b886fe2922ae 100644
> --- a/drivers/infiniband/hw/bnxt_re/main.c
> +++ b/drivers/infiniband/hw/bnxt_re/main.c
> @@ -822,17 +822,6 @@ static void bnxt_re_disassociate_ucontext(struct ib_ucontext *ibcontext)
> }
>
> /* Device */
> -
> -static struct bnxt_re_dev *bnxt_re_from_netdev(struct net_device *netdev)
> -{
> - struct ib_device *ibdev =
> - ib_device_get_by_netdev(netdev, RDMA_DRIVER_BNXT_RE);
> - if (!ibdev)
> - return NULL;
> -
> - return container_of(ibdev, struct bnxt_re_dev, ibdev);
> -}
> -
> static ssize_t hw_rev_show(struct device *device, struct device_attribute *attr,
> char *buf)
> {
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
> index 5ad7fe7e662f..4ddcd5860e0f 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
> @@ -192,10 +192,12 @@ static void usnic_ib_handle_usdev_event(struct usnic_ib_dev *us_ibdev,
>
> static void usnic_ib_handle_port_event(struct ib_device *ibdev,
> struct net_device *netdev,
> - unsigned long event);
> + unsigned long event)
> {
> struct usnic_ib_dev *us_ibdev =
> container_of(ibdev, struct usnic_ib_dev, ib_dev);
> + struct ib_event ib_event;
> +
> mutex_lock(&us_ibdev->usdev_lock);
> switch (event) {
> case NETDEV_UP:
> diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
> index 137819184b3b..6b24438df917 100644
> --- a/drivers/infiniband/sw/siw/siw_verbs.c
> +++ b/drivers/infiniband/sw/siw/siw_verbs.c
> @@ -172,6 +172,7 @@ int siw_query_port(struct ib_device *base_dev, u32 port,
> struct ib_port_attr *attr)
> {
> struct siw_device *sdev = to_siw_dev(base_dev);
> + struct net_device *ndev;
> int rv;
>
> memset(attr, 0, sizeof(*attr));
> @@ -183,7 +184,12 @@ int siw_query_port(struct ib_device *base_dev, u32 port,
> attr->max_mtu = ib_mtu_int_to_enum(sdev->netdev->mtu);
> attr->active_mtu = ib_mtu_int_to_enum(sdev->netdev->mtu);
> attr->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_DEVICE_MGMT_SUP;
> - attr->state = ib_get_curr_port_state(sdev->ndev);
> + ndev = ib_device_get_netdev(base_dev, port);
> + if (ndev)
> + attr->state = ib_get_curr_port_state(ndev);
> + else
> + attr->state = IB_PORT_DOWN;
> + dev_put(ndev);

I think this is a simpler way:

attr->state = ib_get_curr_port_state(sdev->netdev);

But overall LGTM, thanks.

BTW, it seems the kernel test robot has reported some more warnings
after you applied these patches (and solved the conflicts I guess?)

Thanks,
Junxian

> attr->phys_state = attr->state == IB_PORT_ACTIVE ?
> IB_PORT_PHYS_STATE_LINK_UP : IB_PORT_PHYS_STATE_DISABLED;
> /*
>
>
>>
>> Junxian
>