Re: [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms()
From: Doug Anderson
Date: Thu Jun 13 2024 - 10:03:11 EST
Hi,
On Wed, Jun 12, 2024 at 11:56 PM Ilpo Järvinen
<ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, 12 Jun 2024, Doug Anderson wrote:
> > On Wed, Jun 12, 2024 at 12:38 AM Ilpo Järvinen
> > <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, 10 Jun 2024, Douglas Anderson wrote:
> > >
> > > > The current uart_fifo_timeout() returns jiffies, which is not always
> > > > the most convenient for callers. Add a variant uart_fifo_timeout_ms()
> > > > that returns the timeout in milliseconds.
> > > >
> > > > NOTES:
> > > > - msecs_to_jiffies() rounds up, unlike nsecs_to_jiffies(). This is
> > > > because msecs_to_jiffies() is actually intended for device drivers
> > > > to calculate timeout value. This means we don't need to take the max
> > > > of the timeout and "1" since the timeout will always be > 0 ms (we
> > > > add 20 ms of slop).
> > > > - uart_fifo_timeout_ms() returns "unsigned int" but we leave
> > > > uart_fifo_timeout() returning "unsigned long". This matches the
> > > > types of msecs_to_jiffies().
> > > >
> > > > Suggested-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> > > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > > ---
> > > >
> > > > Changes in v4:
> > > > - New
> > > >
> > > > include/linux/serial_core.h | 15 +++++++++++----
> > > > 1 file changed, 11 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> > > > index 8cb65f50e830..97968acfd564 100644
> > > > --- a/include/linux/serial_core.h
> > > > +++ b/include/linux/serial_core.h
> > > > @@ -889,14 +889,21 @@ unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud);
> > > > /*
> > > > * Calculates FIFO drain time.
> > > > */
> > > > -static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > > +static inline unsigned int uart_fifo_timeout_ms(struct uart_port *port)
> > > > {
> > > > u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize;
> > > > + unsigned int fifo_timeout_ms = div_u64(fifo_timeout, NSEC_PER_MSEC);
> > > >
> > > > - /* Add .02 seconds of slop */
> > > > - fifo_timeout += 20 * NSEC_PER_MSEC;
> > > > + /*
> > > > + * Add .02 seconds of slop. This also helps account for the fact that
> > > > + * when we converted from ns to ms that we didn't round up.
> > > > + */
> > > > + return fifo_timeout_ms + 20;
> > > > +}
> > > >
> > > > - return max(nsecs_to_jiffies(fifo_timeout), 1UL);
> > > > +static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > > +{
> > > > + return msecs_to_jiffies(uart_fifo_timeout_ms(port));
> > > > }
> > >
> > > Hi,
> > >
> > > This is definitely towards the right direction! However, it now does
> > > double conversion, first div_u64() and then msecs_to_jiffies(). Perhaps it
> > > would be better to retain the nsecs version (maybe rename it to _ns for
> > > consistency) and add _ms variant that does the nsec -> msec conversion.
> >
> > I spent a bit of time thinking about it and I don't agree. If you feel
> > very strongly about it or someone else wants to jump in and break the
> > tie then I can look again, but:
> >
> > 1. The comment before nsecs_to_jiffies() specifically states that it's
> > not supposed to be used for this purpose. Specifically, it says:
> >
> > * Unlike {m,u}secs_to_jiffies, type of input is not unsigned int but u64.
> > * And this doesn't return MAX_JIFFY_OFFSET since this function is designed
> > * for scheduler, not for use in device drivers to calculate timeout value.
> >
> > ...so switching away from nsecs_to_jiffies() to msecs_to_jiffies() is
> > arguably a "bugfix", or at least avoids using the API in a way that's
> > against the documentation.
>
> Okay, I see. However, there's no way around using u64 here even with your
> version that does not use nsecs_to_jiffies() because nsecs is the most
> useful form of input when starting from frame_time, usecs is a bit
> coarse-grained for higher data rates.
Right. We have to start with u64 because the frame time is in ns and
we can only fit ~4 seconds worth of ns in 32-bits. That seems iffy.
> > 2. As mentioned in the commit message, nsecs_to_jiffies() truncates
> > where msecs_to_jiffies() rounds up. Presumably this difference is
> > related to the comment above where the "ns" version is intended for
> > the scheduler. Using the "ms" version allows us to get rid of the
> > extra call to "max()" which is a benefit. Technically since the
> > timeout is at least 20 ms the minimum HZ is 100 I guess we didn't need
> > the max anyway, but I guess someone thought it was cleaner and now we
> > can definitely get rid of it.
> >
> > 3. These functions are inline anyway, so I don't think it's causing a
> > huge bloat of instructions. In fact, moving from 64-bit math to 32-bit
> > math sooner could make the code smaller.
> >
> > 4. I don't feel like it hurts the readability to convert down to ms
> > and then to jiffies. In fact, IMO it helps since it makes it more
> > obvious that we're working with ms.
>
> I'd be lying if I'd say I feel strongly about it
Fair enough. If someone wants to throw in an opinion and tiebreak then they can.
> but my only argument
> involves doing an extra divide which is somewhat costly. It's a
> plain 32-bit divide though so not as bad as the u64 one that is
> unavoidable.
We shouldn't be calling this in a loop anyway, so it's unlikely to
matter. In any case, I'd note that with the old code we had:
1. 64-bit multiply (time * fifosize)
2. 64-bit addition (result + 20ms)
3. 64-bit => 32-bit division (to jiffies)
4. 32-bit comparison against the value 1.
5. Conditional setting of the value to 1.
Now we have:
1. 64-bit multiply (time * fifosize)
2. 64-bit => 32-bit division (to ms)
3. 32-bit addition with a small immediate (20)
4. 32-bit addition (div round up) if HZ != 1000
5. 32-bit division (div round up) if HZ != 1000
I didn't try disassembling to see what the compiler did and it would
be different for each compiler / ISA / optimization level / value of
HZ, but I guess my point is that while we have one more divide (unless
HZ == 1000) we may have one less conditional. We're also tending to do
our math with small immediates which some ISAs can handle more
efficiently.
I think the real answer, though, is that this doesn't really matter
and that we should pick the solution that's cleaner/easier to
understand. I'm still in favor of the patch as it is. As I said, if
folks feel really strongly then it doesn't matter and I can change it,
but otherwise I'd rather keep it the way it is.
-Doug