Re: [patch] change futex_wait() to hrtimers
From: Nick Piggin
Date: Mon Mar 12 2007 - 10:32:37 EST
On Mon, Mar 12, 2007 at 10:12:14AM -0400, Theodore Tso wrote:
> On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote:
> > On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
> > > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> > > > Ingo Molnar <mingo@xxxxxxx> writes:
> > > > >
> > > > > the only correct approach is the use of hrtimers, and a patch exists for
> > > > > that - see below. This has been included in -rt for quite some time.
> > > >
> > > > But isn't that bad for power management? You'll likely get more
> > > > idle wakeups, won't you?
> > >
> > > Why so ? It comes more precise, but only once.
> >
> > When it's clustered around the jiffies interval then wakeups from
> > multiple processes will be somewhat batched. With a precise wakeup you'll
> > get wakeups all over the jiffies period, won't you?
>
> What we probably need in the long-term, and not just for high
> precision wakeups, is we need a way for waiters (either in the kernel
> or in userspace) to specify a desired precision in their timers. Is
> it, "wake me up in a second, exactly", or "wake me up in a second,
> plus or minus 10ms"? (or 50ms? or 100ms?).
Would this work, or will it just create more confusion for the API user?
I mean, all sleeps can only guarantee "no less than".
Would it be enough for a binary (exact as possible / relaxed if needed)
flag? Or perhaps ternary (exact/relaxed/batched), where relaxed could
add an extra jiffy or so, and batched is really relaxed that may delay
up to double the value of the timeout.
> This becomes especially important if we want the tickless code to
> really shine as far as power management is concerned. Unfortunately,
> the POSIX timer abstraction doesn't give this kind of flexibility
> easily, so it's going to be a while before we see significant
> userspace adoption of such a kernel feature, but I think it's
> something that would be still worthwhile to add.
But given that we know most userspace API timeouts are broadly just an
"equal to or greater", then we could add another timeout flag to specify
it is a userspace timeout, and make that controllable by sysctl.
Sure it isn't ideal, but for those who really want power / hypervisor
savings, it could be useful.
BTW. my futex man page says timeout's contents "describe the maximum duration
of the wait". Surely that should be *minimum*? Michael cc'ed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/