Re: [PATCH] net/mlx5: reduce stack usage in FW tracer

From: Arnd Bergmann
Date: Mon Sep 09 2019 - 16:19:08 EST

On Mon, Sep 9, 2019 at 9:39 PM Saeed Mahameed <saeedm@xxxxxxxxxxxx> wrote:
> On Fri, 2019-09-06 at 17:11 +0200, Arnd Bergmann wrote:
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> > @@ -557,16 +557,16 @@ static void mlx5_tracer_print_trace(struct
> > tracer_string_format *str_frmt,
> > struct mlx5_core_dev *dev,
> > u64 trace_timestamp)
> > {
> > - char tmp[512];
> > -
> Hi Arnd, thanks for the patch,
> this function is very perfomance critical when fw traces are activated
> to pull some fw content on error situations, using kmalloc here might
> become a problem and stall the system further more if the problem was
> initially due to lack of memory.
> since this function only needs 512 bytes maybe we should mark it as
> noinline to avoid any extra stack usages on the caller function
> mlx5_fw_tracer_handle_traces ?

That would shut up the warning, but doesn't sound right either.

If it's performance critical indeed, maybe the best solution would
be to also avoid the snprintf(), as that is also a rather heavyweight

I could not find an easy solution for this, but I did notice the unusual way
this deals with a variable format string passed into mlx5_tracer_print_trace
along with a set of parameters, which opens up a set of possible
format string vulnerabilities as well as making mlx5_tracer_print_trace()
a bit expensive. You also take a mutex and free memory in there,
which obviously then also got allocated in the fast path.

To do this right, a better approach may be to just rely on ftrace, storing
the (pointer to the) format string and the arguments in the buffer without
creating a string. Would that be an option here?

A more minimal approach might be to move what is now the on-stack
buffer into the mlx5_fw_tracer function. I see that you already store
a copy of the string in there from mlx5_fw_tracer_save_trace(),
which conveniently also holds a mutex already that protects
it from concurrent access.