[PATCH 1/1] RDMA/mlx5: Release CPU for other processes in mlx5_free_cmd_msg()

From: Anand Khoje
Date: Tue May 21 2024 - 23:33:27 EST


In non FLR context, at times CX-5 requests release of ~8 million device pages.
This needs humongous number of cmd mailboxes, which to be released once
the pages are reclaimed. Release of humongous number of cmd mailboxes
consuming cpu time running into many secs, with non preemptable kernels
is leading to critical process starving on that cpu’s RQ. To alleviate
this, this patch relinquishes cpu periodically but conditionally.

Orabug: 36275016

Signed-off-by: Anand Khoje <anand.a.khoje@xxxxxxxxxx>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 9c21bce..9fbf25d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1336,16 +1336,23 @@ static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev *dev,
return ERR_PTR(err);
}

+#define RESCHED_MSEC 2
static void mlx5_free_cmd_msg(struct mlx5_core_dev *dev,
struct mlx5_cmd_msg *msg)
{
struct mlx5_cmd_mailbox *head = msg->next;
struct mlx5_cmd_mailbox *next;
+ unsigned long start_time = jiffies;

while (head) {
next = head->next;
free_cmd_box(dev, head);
head = next;
+ if (time_after(jiffies, start_time + msecs_to_jiffies(RESCHED_MSEC))) {
+ mlx5_core_warn_rl(dev, "Spent more than %d msecs, yielding CPU\n", RESCHED_MSEC);
+ cond_resched();
+ start_time = jiffies;
+ }
}
kfree(msg);
}
--
1.8.3.1