Re: [RFC][PATCH] i386 x86-64 Eliminate Local APIC timer interrupt

From: Andi Kleen
Date: Mon May 02 2005 - 14:10:23 EST


I thought about it more and i really dislike the broadcast timer
more and more. Zwanes point on creating a lot of contention
on irq0 datastructures is also a very good one.

> Fully agree with you on the mess part :(. Few other options that we had
> thought about earlier:
> - Have some sort of callbacks while entering/exiting C3, and hand manipulate
> Local APIC timer counter to account for the time spent in C3 state. This is
> less intrusive change (affects only the system that has C3), but code starts
> getting ugly once we have time spent in C3 exceed a jiffy and spans across
> multiple jiffies. And we have to have some execute some code to handle all
> the lost local APIC timer idle ticks (for the statistics part) and can
> increase C3 wakeup latency higher.

It is a bit messy agreed, but no timer tick in idle has to do this
anyways. And we need to communicate with the ACPI idle code even
because we need to shorten delays artificially in lower sleep
modi (e.g. in C1 you dont want to sleep for longer than a ms
before waking up and switching into C2)

So given that we need this anyways (and I have it partly coded up
already) I think that is the way to go. The no tick code has
to query the backing time in this case anyways (or rather use the TSC
instead which is local - and I hope is still accurate even after C3)
and fix the timer up. So the infrastructure is there already
and the APIC problem can be handled in the same way.

BTW can you confirm that the APIC timer frequency is stable
over cpufreq changes on your x86-64 CPUs, or does this need
to be handled too?

The only drawback is that these systems will pretty much need
no timer tick in idle then to be reliable - i had hoped
to keep it an experimental option at the beginning; but I guess
we can accelerate it a bit.

So I would propose to go with this variant.

What do you think?


> - Other simpler solution is to remove idle from cpu_usage_stat and use
> (overall time - other accounted time) instead. This doesn't really solve
> the problem, but it is a workaround for all the code that depends on
> proper idle statistics.

What code is that exactly?

In no idle tick I just do the same thing as s390 and accumulate any lost ticks
up in a loop. However I have not done much measurements how affected
the CPU time statistics are.

But when the choice is between better power saving and slightly
less accurate statistics I will prefer better power saving any day - and
the people who really need good statistics are always free to turn
off the power saving, but I doubt that will happen often as long
as the statistics are "good enough". e.g. we can be factor 10 worse
and not be worse than 2.4 with HZ=100. That is a lot of play ground.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/