Re: CONFIG_NO_HZ breaks blktrace timestamps

From: Guillaume Chazarain
Date: Fri Jan 11 2008 - 05:41:56 EST


David Dillow <dillowda@xxxxxxxx> wrote:

> Patched kernel, nohz=off:
> .clock_underflows : 213887

A little bit of warning about these patches, they are WIP, that's why I
did not send them earlier. It regress nohz=off.

A bit of context: these patches aim at making sure cpu_clock() on my
laptop (cpufreq enabled) never overflows/underflows/warps with
CONFIG_NOHZ enabled. With these patches, I have a few hundreds
overflows and underflows during early bootup, and then nothing :-)

Ingo Molnar <mingo@xxxxxxx> wrote:

> they are from the scheduler git tree (except the first debug patch), but
> queued up for v2.6.25 at the moment.

You are talking about "x86: scale cyc_2_nsec according to CPU
frequency" here, but I don't think it is at stakes here as David has:

> CONFIG_CPU_FREQ is not set

Let me review my patches myself to give a bit of context:

> sched: monitor clock underflows in /proc/sched_debug

This, I'd like to have it in .25 just for convenience.

> x86: scale cyc_2_nsec according to CPU frequency

You already know that one ;-)

> sched: fix rq->clock warps on frequency changes

This is a bugfix for .25 once the previous patch is applied. I don't
think it helps David, but it could help blktrace users with cpufreq
enabled.

> sched: Fix rq->clock overflows detection with CONFIG_NO_HZ

I think this one is the most important for David, but unfortunately it
has some problems.

> +static inline u64 max_skipped_ticks(struct rq *rq)
> +{
> + return nohz_on(cpu_of(rq)) ? jiffies - rq->last_tick_seen + 2 : 1;
> +}

Here, I initially wrote rq->last_tick_seen + 1 but experiments showed
that +2 was needed as I really saw deltas of 2 milliseconds.

These patches have two objectives:
- taking into account that jiffies are not always incremented by 1
thanks to nohz
- as the tick is stopped and restarted it may not tick at the exact
expected moment, so allow a window of 1 jiffie. If the tick occurs
during the right jiffy, we know the TSC is more precise than the tick
so don't correct the clock.

And the problem is that I seem to need a window of 2 jiffies, so I need
some help.

> sched: make sure jiffies is up to date before calling __update_rq_clock()

This is one is needed too but I'm less confident in its validity.

> scheduler_tick() is not called every jiffies

This one is a bit ugly and seems to break nohz=off.

> - if (unlikely(rq->clock < next_tick)) {
> + if (unlikely(rq->clock < next_tick - nohz_on(cpu) * TICK_NSEC)) {

No, I'm not proud of this :-(

Thanks.

--
Guillaume
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/