Re: [PATCH] sky2: Use deferrable timer for watchdog
From: Parag Warudkar
Date: Thu Dec 20 2007 - 15:36:33 EST
On Dec 20, 2007 3:04 PM, Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> wrote:
> > I think it is reasonable for Network driver watchdogs to use a
> > deferrable timer - if the machine is 100% IDLE there is no one needing
> > the network to be up. If there is something running even on the other
> > CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> > which will make it very likely in practice that each CPU will be
> > interrupted in reasonable amount of time.
>
> this is not correct; many machines are idle waiting for network data. Think of webservers...
Yes, I forgot the receive case. So if a server was 100% IDLE and a web
server was listening for network data and we reach 0 wakeups per
second on the CPU where the network watchdog timer is scheduled to run
deferred _and_ the network link went down, it would cause the watchdog
to not run and redo the link until some one else wakes up that CPU
later.
So as long as we make sure we don't convert every timer to deferrable
we should be ok - may be this can be resolved easily by having a
non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
that just causes at least one wake up in some reasonable time delta
from the previous wakeup (whoever caused that one.) It is still
beneficial in that all deferrable timers would run at once without
needing to have separate wakeup for each.
>
> >
> > Of course there are theoretical cases where we could land into a
> > situation where a CPU in a multiprocessor machine is IDLE infinitely
> > and that causes the watchdog that happens to be bound to run on the
> > same CPU to not run. To take care of these unlikely cases I think the
> > timer mechanism should have a reasonable limit on how long a CPU can
> > go IDLE if there are deferrable timers.
>
> how about something else instead: a timer mechanism that takes a range instead..
> that at least has defined semantics; the deferrable semantics really are "indefinite".
> Lets keep at least the semantics clear and clean.
>
Would not the simpler solution of installing a non-deferrable timer
per cpu which will not allow the CPU to go IDLE for more than x units
of time at once (or something to that effect) work? Range would
complicate the thing and I am not sure how many cases will know
reasonably correct range for their normal operation. In this instance
of the e1000 watchdog what range could it give and be successful at
what it wants to do - bring up the link in reasonable amount of time,
while also realizing the power savings?
Perhaps depending on Server/Laptop/Desktop machine (may be based on
Preemption) we could have normal or deferrable timers but that'll
exclude Servers from power savings and I am not sure Data center folks
will like that :) .
Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/