Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?causedby netem)

From: Thomas Gleixner
Date: Thu Jul 09 2009 - 08:04:27 EST


On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> >
> > I have the feeling that the code relies on some implicit cpu
> > boundness, which is not longer guaranteed with the timer migration
> > changes, but that's a question for the network experts.
>
> As a matter of fact, I've just looked at this __netif_schedule(),
> which really is cpu bound, so you might be 100% right.

So the watchdog is the one which causes the trouble. The patch below
should fix this.

Thanks,

tglx
---

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 24d17ce..fbe554f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -485,7 +485,7 @@ void qdisc_watchdog_schedule(struct qdisc_watchdog *wd, psched_time_t expires)
wd->qdisc->flags |= TCQ_F_THROTTLED;
time = ktime_set(0, 0);
time = ktime_add_ns(time, PSCHED_TICKS2NS(expires));
- hrtimer_start(&wd->timer, time, HRTIMER_MODE_ABS);
+ hrtimer_start(&wd->timer, time, HRTIMER_MODE_ABS_PINNED);
}
EXPORT_SYMBOL(qdisc_watchdog_schedule);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/