Re: dynamic-hz

From: Andrew Morton
Date: Mon Dec 13 2004 - 23:35:28 EST

Nish Aravamudan <nish.aravamudan@xxxxxxxxx> wrote:
> On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@xxxxxxxx> wrote:
> > Andrea Arcangeli <andrea@xxxxxxx> wrote:
> > >
> > > The patch only does HZ at dynamic time. But of course it's absolutely
> > > trivial to define it at compile time, it's probably a 3 liner on top of
> > > my current patch ;). However personally I don't think the three liner
> > > will worth the few seconds more spent configuring the kernel ;).
> >
> > We still have 1000-odd places which do things like
> >
> > schedule_timeout(HZ/10);
> Yes, yes, we do :) I replaced far more than I ever thought I could...
> There are a few issues I have with the remaining schedule_timeout()
> calls which I think fit ok with this thread... I'd especially like
> your input, Andrew, as you end up getting most of my patches from KJ.
> Many drivers use
> set_current_state(TASK_{UN,}INTERRUPTIBLE);
> schedule_timeout(1); // or some other small value < 10
> This may or may not hide a dependency on a particular HZ value. If the
> code is somewhat old, perhaps the author intended the task to sleep
> for 1 jiffy when HZ was equal to 100. That meants that they ended up
> sleeping for 10 ms. If the code is new, the author intends that the
> task sleeps for 1 ms (HZ==1000). The question is, what should the
> replacement be?

Presumably they meant 10 milliseconds. Or at least, that is the delay
which the developer did his testing with.

> If they really meant to use schedule_timeout(1) in the sense of
> highest resolution delay possible (the latter above), then they
> probably should just call schedule() directly.

argh. Never do that. It's basically a busywait and can cause lockups if
the calling task has realtime scheduling policy.

> schedule_timeout(1)
> simply sets up a timer to fire off after 1 jiffy & then calls
> schedule() itself. The overhead of setting up a timer and the
> execution of schedule() itself probably means that the timer will go
> off in the middle of the schedule() call or very shortly thereafter (I
> think). In which case, it makes more sense to use schedule()
> directly...
> If they meant to schedule a delay of 10ms, then msleep() should be
> used in those cases. msleep() will also resolve the issues with 0-time
> timeouts because of rounding, as it adds 1 to the converted parameter.
> Obviously, changing more and more sleeps to msecs & secs will really
> help make the changing of HZ more transparent. And specifying the time
> in real time units just seems so much clearer to me.
> What do people think?

I'd say that replacing them with msleep(10) is the safest approach.
Depending on what the surronding code is actually doing, of course.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at