Re: [PATCH] aoe: Use 64-bit timestamp in frame

From: Arnd Bergmann
Date: Tue May 12 2015 - 05:44:38 EST


On Monday 11 May 2015 21:00:25 Ed Cashin wrote:
> First, thanks for the patch. I do appreciate the attempt to simplify
> this part of the driver, but I don't think that this patch is good to merge.
>
> I'll make some comments inline below.
>
> On 05/10/2015 10:35 PM, Tina Ruchandani wrote:
> > 'struct frame' uses two variables to store the sent timestamp - 'struct
> > timeval' and jiffies. jiffies is used to avoid discrepancies caused by
> > updates to system time. 'struct timeval' uses 32-bit representation for
> > seconds which will overflow in year 2038.
>
> The comment in the deleted lines below mentions the fact that the
> overflow does not matter for calculating rough-grained deltas in time.
> So there is no problem in 2038 or on systems with the clock set to 2038
> accidentally.

To clarify: because 'struct timeval' is known to be broken in general,
we want to remove it from the kernel entirely and replace it with
something that is known to be correct in all cases, even for the drivers
that are currently correct. Out of the ~250 files in the kernel that
use 'timeval', we can't easily tell which ones are correct, so by replacing
them all, we can eliminate all the bugs.

We should be able to do that in a way that generally improves all the
drivers, because using 'timeval' tends to be suboptimal to start with.

If the currently available interfaces make things worse for the aoe
driver, we may have to add extra infrastructure, and get something that
also helps the conversion of other drivers.

> > This patch does the following:
> > - Replace the use of 'struct timeval' and jiffies with ktime_t, which
> > is a 64-bit timestamp and is year 2038 safe.
> > - ktime_t provides both long range (like jiffies) and high resolution
> > (like timeval). Using ktime_get (monotonic time) instead of wall-clock
> > time prevents any discprepancies caused by updates to system time.
>
> But the patch only changes the struct frame data. The aoe driver
> only has the struct frame for an incoming AoE response when that
> response is "expected". If the response comes in a bit late, the frame
> may have already been used for a new command.
>
> You can see that in aoecmd_ata_rsp when getframe_deferred returns
> NULL and tsince is called instead of tsince_hr.
>
> In that case, there is still information about the timing embedded in
> the AoE tag. The send time in jiffies is a rough-grained record of the
> send time, and it's extracted from the tag. For these "unexpected"
> responses, this timing information can improve performance significantly
> without introducing extra overhead or risk.

That path is not changed at all by this patch, right? It also looks
like the jiffies information from there is only used to print an
error message.

> I don't think the patch considers this aspect of the way the round trip
> time is calculated, and I don't think the primary motivation is justified
> (if that's 2038 safety, which we have already).
>
> Simplifying it would be nice, but it would be difficult to thoroughly test
> all of the performance implications. There are still people using 32-bit
> systems, for example.

Here is my analysis regarding the performance implications:

- Avoiding the access to 'jiffies' in a few places has basically zero
impact in small systems, but may help on large SMP machines because it
avoids cache line bouncing when you have use multiple concurrent accesses
to jiffies.

- Replacing do_gettimeofday() with ktime_get() will improve things slightly
on all machines, because it avoids a 32-bit division that takes a couple
of cycles, up to hundreds of cycles on some CPU architectures.

- This leaves a single change that is currently making things worse in
tsince_hr():

> > - do_gettimeofday(&now);
> > - n = now.tv_usec - f->sent.tv_usec;
> > - n += (now.tv_sec - f->sent.tv_sec) * USEC_PER_SEC;
> > + now = ktime_get();
> > + n = ktime_to_us(ktime_sub(now, f->sent));

ktime_to_us() requires a constant 64-bit integer division that is
significantly more expensive than the 32-bit it replaces. Thanks to your
analysis, I think it's fair to say that the function is indeed timing
critical and we should try hard avoid introducing this overhead.

There are of course multiple ways to do this. One way would be to
change the code to work on 32-bit nanoseconds instead of 32-bit
microseconds. This requires proving that the we cannot exceed
4.29 seconds of round-trip time in calc_rttavg().
Is that a valid assumption or not?

If not, we could replace do_gettimeofday() with ktime_get_ts64().
This will ensure we don't need a 64-bit division when converting
the ts64 to a 32-bit microsecond value, and combined with the
conversion is still no slower than do_gettimeofday(), and it
still avoids the double bookkeeping because it uses a monotonic
timebase that is robust against settimeofday.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/