Re: [PATCH] tracing: Fix wraparound problems in "uptime" tracer

From: Steven Rostedt
Date: Mon Jun 30 2014 - 14:40:24 EST


On Mon, 30 Jun 2014 11:17:18 -0700
Tony Luck <tony.luck@xxxxxxxxx> wrote:

> There seem to be no non-racy solutions ... I've been wondering
> about giving up on a generic jiffies_to_nsec() function because
> people might use it in cases where the races might be likley to
> bite them. For my need, I think that "perfect is the enemy of good":
>
> 1) The race window is only a few microseconds wide
> 2) It only exists on 32-bit kernels - which are dying out on server
> systems because they can't handle the amounts of memory on modern
> machines.
> 3) It opens every 49 days (on a HZ=1000 system)
> 4) I'm logging error events that happen at a "per-month" frequency (or lower)
> 5) If the race does happen - the visible result is that we have a
> bad time logged against an error event.
>
> so what about this: ...
>
> From: Tony Luck <tony.luck@xxxxxxxxx>
>
> The "uptime" tracer added in:
> commit 8aacf017b065a805d27467843490c976835eb4a5
> tracing: Add "uptime" trace clock that uses jiffies
> has wraparound problems when the system has been up more
> than 1 hour 11 minutes and 34 seconds. It converts jiffies
> to nanoseconds using:
> (u64)jiffies_to_usecs(jiffy) * 1000ULL
> but since jiffies_to_usecs() only returns a 32-bit value, it
> truncates at 2^32 microseconds. An additional problem on 32-bit
> systems is that the argument is "unsigned long", so fixing the
> return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
> system).
>
> We can't provide a full features jiffies_to_nsec() function in
> any safe way (32-bit systems need locking to read the full 64-bit
> jiffies value). Just do the best we can here and recognise that
> 32-bit systems may seem some timestamp anomolies if jiffies64
> was in the middle of rolling over a 2^32 boundary.
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
> kernel/timeconst.bc | 6 ++++++
> kernel/trace/trace_clock.c | 10 ++++++++--
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
> index 511bdf2cafda..a5fef7a7fb27 100644
> --- a/kernel/timeconst.bc
> +++ b/kernel/timeconst.bc
> @@ -100,6 +100,12 @@ define timeconst(hz) {
> print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
> print "\n"
>
> + obase=10
> + cd=gcd(hz,1000000000)
> + print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
> + print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
> + print "\n"
> +
> print "#endif /* KERNEL_TIMECONST_H */\n"
> }
> halt
> diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
> index 26dc348332b7..dc5b11b9f8a4 100644
> --- a/kernel/trace/trace_clock.c
> +++ b/kernel/trace/trace_clock.c
> @@ -59,13 +59,19 @@ u64 notrace trace_clock(void)
>
> /*
> * trace_jiffy_clock(): Simply use jiffies as a clock counter.
> + * This usage of jiffies_64 isn't safe on 32-bit, but we may be
> + * called from NMI context, and we have no safe way to get a timestamp.
> */
> u64 notrace trace_clock_jiffies(void)
> {
> - u64 jiffy = jiffies - INITIAL_JIFFIES;
> + u64 jiffy = jiffies_64 - INITIAL_JIFFIES;
>
> /* Return nsecs */
> - return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
> +#if !(NSEC_PER_SEC % HZ)
> + return (NSEC_PER_SEC / HZ) * jiffy;
> +#else
> + return (jiffy * HZ_TO_NSEC_NUM) / HZ_TO_NSEC_DEN;

Wont this break on 32 bit systems. That is, you can't divide 64bit
integers without using do_div().

-- Steve

> +#endif
> }
>
> /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/