RE: [PATCH] drivers:atlx: Prevent integer overflow in statistics aggregation

From: Keller, Jacob E
Date: Tue Oct 08 2024 - 13:15:32 EST




> -----Original Message-----
> From: Rand Deeb <rand.sec96@xxxxxxxxx>
> Sent: Tuesday, October 8, 2024 9:59 AM
> To: Jakub Kicinski <kuba@xxxxxxxxxx>
> Cc: Chris Snook <chris.snook@xxxxxxxxx>; David S . Miller
> <davem@xxxxxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; Paolo Abeni
> <pabeni@xxxxxxxxxx>; Christian Marangi <ansuelsmth@xxxxxxxxx>;
> netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; deeb.rand@xxxxxxxxxxxx;
> lvc-project@xxxxxxxxxxxxxxxx; voskresenski.stanislav@xxxxxxxxxxxx
> Subject: Re: [PATCH] drivers:atlx: Prevent integer overflow in statistics aggregation
>
> On Tue, Oct 8, 2024 at 3:27 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > On Mon, 7 Oct 2024 12:29:36 +0300 Rand Deeb wrote:
> > > The `atl1_inc_smb` function aggregates various RX and TX error counters
> > > from the `stats_msg_block` structure. Currently, the arithmetic operations
> > > are performed using `u32` types, which can lead to integer overflow when
> > > summing large values. This overflow occurs before the result is cast to
> > > a `u64`, potentially resulting in inaccurate network statistics.
> > >
> > > To mitigate this risk, each operand in the summation is explicitly cast to
> > > `u64` before performing the addition. This ensures that the arithmetic is
> > > executed in 64-bit space, preventing overflow and maintaining accurate
> > > statistics regardless of the system architecture.
> >
> > Thanks for the nice commit message, but honestly I don't think
> > the error counters can overflow u32 on an ancient NIC like this.
>

2^32-1 = 4294967295

If we assume that the card operates for at least 10 years, you will need an error rate of ~14 per second to overflow a 32bit counter over the 10 year period. Longer operation uptime time could decrease the error rate. That does seem unlikely.

> Hi Jakub,
>
> Thanks for your feedback, much appreciated!
>
> Honestly, when I was investigating this, I had the same thoughts regarding
> the possibility of the counters overflowing. However, I want to clarify
> that the variables where we store the results of these summations (like
> new_rx_errors, new_tx_errors, etc.) are already u64 types. Given that, it
> seems logical to cast the operands to u64 before the addition to ensure
> consistency and avoid any potential issues during the summation.
>
> Additionally, all counters in the atl1_sft_stats structure are also
> defined as u64, which reinforces the rationale for casting the operands in
> the summation as well.
>
> Thanks again for your input!
>
> Best regards,
> Rand Deeb

Still, if the resulting storage is already 64bit, I think it does make sense to do the arithmetic in 64bit.