Re: [PATCH v2 1/1] net/mlx5: Added cond_resched() to crdump collection
From: Mohamed Khalfella
Date: Wed Sep 04 2024 - 23:37:12 EST
On 2024-09-03 14:14:58 +0200, Alexander Lobakin wrote:
> From: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> Date: Fri, 30 Aug 2024 11:01:19 -0700
>
> > On 2024-08-30 15:07:45 +0200, Alexander Lobakin wrote:
> >> From: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> >> Date: Thu, 29 Aug 2024 15:38:56 -0600
> >>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> >>> index 6b774e0c2766..bc6c38a68702 100644
> >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c
> >>> @@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
> >>> {
> >>> unsigned int next_read_addr = 0;
> >>> unsigned int read_addr = 0;
> >>> + unsigned int count = 0;
> >>>
> >>> while (read_addr < length) {
> >>> if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr,
> >>> @@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data,
> >>> return read_addr;
> >>>
> >>> read_addr = next_read_addr;
> >>> + /* Yield the cpu every 128 register read */
> >>> + if ((++count & 0x7f) == 0)
> >>> + cond_resched();
> >>
> >> Why & 0x7f, could it be written more clearly?
> >>
> >> if (++count == 128) {
> >> cond_resched();
> >> count = 0;
> >> }
> >>
> >> Also, I'd make this open-coded value a #define somewhere at the
> >> beginning of the file with a comment with a short explanation.
>
> This is still valid.
Done. See <1>.
>
> >
> > What you are suggesting should work also. I copied the style from
> > mlx5_vsc_wait_on_flag() to keep the code consistent. The comment above
> > the line should make it clear.
>
> I just don't see a reason to make the code less readable.
<1> Now I am looking at mlx5_vsc_wait_on_flag() again, I realized the
code does not want to reset retries to 0 because it needs to check when
it reaches VSC_MAX_RETRIES. This is not the case here. I will update the
code as suggested.
>
> >
> >>
> >> BTW, why 128? Not 64, not 256 etc? You just picked it, I don't see any
> >> explanation in the commitmsg or here in the code why exactly 128. Have
> >> you tried different values?
> >
> > This mostly subjective. For the numbers I saw in the lab, this will
> > release the cpu after ~4.51ms. If crdump takes ~5s, the code should
> > release the cpu after ~18.0ms. These numbers look reasonable to me.
>
> So just mention in the commit message that you tried different values
> and 128 gave you the best results.
I will update the commit message in v3.