Re: CONFIG_NO_HZ breaks blktrace timestamps
From: Guillaume Chazarain
Date: Fri Jan 11 2008 - 05:41:56 EST
David Dillow <dillowda@xxxxxxxx> wrote:
> Patched kernel, nohz=off:
> .clock_underflows : 213887
A little bit of warning about these patches, they are WIP, that's why I
did not send them earlier. It regress nohz=off.
A bit of context: these patches aim at making sure cpu_clock() on my
laptop (cpufreq enabled) never overflows/underflows/warps with
CONFIG_NOHZ enabled. With these patches, I have a few hundreds
overflows and underflows during early bootup, and then nothing :-)
Ingo Molnar <mingo@xxxxxxx> wrote:
> they are from the scheduler git tree (except the first debug patch), but
> queued up for v2.6.25 at the moment.
You are talking about "x86: scale cyc_2_nsec according to CPU
frequency" here, but I don't think it is at stakes here as David has:
> CONFIG_CPU_FREQ is not set
Let me review my patches myself to give a bit of context:
> sched: monitor clock underflows in /proc/sched_debug
This, I'd like to have it in .25 just for convenience.
> x86: scale cyc_2_nsec according to CPU frequency
You already know that one ;-)
> sched: fix rq->clock warps on frequency changes
This is a bugfix for .25 once the previous patch is applied. I don't
think it helps David, but it could help blktrace users with cpufreq
enabled.
> sched: Fix rq->clock overflows detection with CONFIG_NO_HZ
I think this one is the most important for David, but unfortunately it
has some problems.
> +static inline u64 max_skipped_ticks(struct rq *rq)
> +{
> + return nohz_on(cpu_of(rq)) ? jiffies - rq->last_tick_seen + 2 : 1;
> +}
Here, I initially wrote rq->last_tick_seen + 1 but experiments showed
that +2 was needed as I really saw deltas of 2 milliseconds.
These patches have two objectives:
- taking into account that jiffies are not always incremented by 1
thanks to nohz
- as the tick is stopped and restarted it may not tick at the exact
expected moment, so allow a window of 1 jiffie. If the tick occurs
during the right jiffy, we know the TSC is more precise than the tick
so don't correct the clock.
And the problem is that I seem to need a window of 2 jiffies, so I need
some help.
> sched: make sure jiffies is up to date before calling __update_rq_clock()
This is one is needed too but I'm less confident in its validity.
> scheduler_tick() is not called every jiffies
This one is a bit ugly and seems to break nohz=off.
> - if (unlikely(rq->clock < next_tick)) {
> + if (unlikely(rq->clock < next_tick - nohz_on(cpu) * TICK_NSEC)) {
No, I'm not proud of this :-(
Thanks.
--
Guillaume
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/