Re: [PATCH 1/1] RDMA/mlx5: Release CPU for other processes in mlx5_free_cmd_msg()

From: Shay Drori
Date: Sun May 26 2024 - 11:24:14 EST


Hi Anand.

First, the correct Mailing list for this patch is
netdev@xxxxxxxxxxxxxxx, please send there the next version.

On 22/05/2024 6:32, Anand Khoje wrote:
In non FLR context, at times CX-5 requests release of ~8 million device pages.
This needs humongous number of cmd mailboxes, which to be released once
the pages are reclaimed. Release of humongous number of cmd mailboxes
consuming cpu time running into many secs, with non preemptable kernels
is leading to critical process starving on that cpu’s RQ. To alleviate
this, this patch relinquishes cpu periodically but conditionally.

Orabug: 36275016

this doesn't seem relevant


Signed-off-by: Anand Khoje <anand.a.khoje@xxxxxxxxxx>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 9c21bce..9fbf25d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1336,16 +1336,23 @@ static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev *dev,
return ERR_PTR(err);
}
+#define RESCHED_MSEC 2


What if you add cond_resched() on every iteration of the loop ? Does it
take much more time to finish 8 Million pages or same ?
If it does matter, maybe 2 ms is too high freq ? 20 ms ? 200 ms ?

Thanks

static void mlx5_free_cmd_msg(struct mlx5_core_dev *dev,
struct mlx5_cmd_msg *msg)
{
struct mlx5_cmd_mailbox *head = msg->next;
struct mlx5_cmd_mailbox *next;
+ unsigned long start_time = jiffies;
while (head) {
next = head->next;
free_cmd_box(dev, head);
head = next;
+ if (time_after(jiffies, start_time + msecs_to_jiffies(RESCHED_MSEC))) {
+ mlx5_core_warn_rl(dev, "Spent more than %d msecs, yielding CPU\n", RESCHED_MSEC);
+ cond_resched();
+ start_time = jiffies;
+ }
}
kfree(msg);
}