timer race condition? (was: RE: L68K: Re: IDE-Driver Update ::

Geert Uytterhoeven (Geert.Uytterhoeven@cs.kuleuven.ac.be)
Tue, 25 Aug 1998 22:42:14 +0200 (CEST)


On Tue, 25 Aug 1998, Geert Uytterhoeven wrote:
> I added a test for jiffies < hwgroup->timer.expires at the top of
> ide_timer_expiry(). If positive, I do printk and just leave the function,
> hoping that things will go on correctly. Unfortunately they don't. Sometimes it
> gets stuck on the first case, in other cases it dumps a lot of messages and
> hangs.
>
> Anyway, it's clear that timer functions are called before they have expired.
> I'll dive into the timer code.

Well, this started as an IDE problem, but now it seems to have become a timer
problem.

Timer functions may be called _before_ they are expired (e.g. immediately),
causing problems in the IDE driver.

For debugging, I used the following patch:

--- kernel/sched.c.orig Tue Aug 11 19:18:28 1998
+++ kernel/sched.c Tue Aug 25 21:47:45 1998
@@ -392,6 +392,10 @@
{
unsigned long flags;

+if (timer->next || timer->prev)
+ printk("add_timer: timer = %p, next = %p, prev = %p\n", timer, timer->next,
+ timer->prev);
+
spin_lock_irqsave(&timerlist_lock, flags);
internal_add_timer(timer);
spin_unlock_irqrestore(&timerlist_lock, flags);
@@ -783,6 +787,7 @@
while ((timer = tv1.vec[tv1.index])) {
void (*fn)(unsigned long) = timer->function;
unsigned long data = timer->data;
+printk("jiffies = %ld timer_jiffies = %ld expires = %ld\n", jiffies, timer_jiffies, timer->expires);
detach_timer(timer);
timer->next = timer->prev = NULL;
spin_unlock_irq(&timerlist_lock);

It's not that the IDE driver modifies timer->expires and calls add_timer()
without having removed the timer with del_timer() before. The test for
timer->next || timer->prev would catch that case.

Anyone with a clue? This took me already +24 hours of debugging and fscking.

Output result:

First log:

| ...
| jiffies = 24048 timer_jiffies = 24048 expires = 24048
| jiffies = 24065 timer_jiffies = 24065 expires = 24065
| jiffies = 24073 timer_jiffies = 24072 expires = 24072
| jiffies = 24076 timer_jiffies = 24076 expires = 24076
| jiffies = 24077 timer_jiffies = 24077 expires = 24077
| jiffies = 24104 timer_jiffies = 24104 expires = 24104
| jiffies = 24130 timer_jiffies = 24129 expires = 24129
| jiffies = 24141 timer_jiffies = 24141 expires = 24141
| jiffies = 24141 timer_jiffies = 24141 expires = 25141
^^^^^ ^^^^^
| IGNORING: now = 24141 < 25141 = expire time
| jiffies = 24141 timer_jiffies = 24141 expires = 25141
| IGNORING: now = 24141 < 25141 = expire time
| jiffies = 24141 timer_jiffies = 24141 expires = 25141
| IGNORING: now = 24141 < 25141 = expire time
| jiffies = 24141 timer_jiffies = 24141 expires = 25141
| IGNORING: now = 24142 < 25141 = expire time
| jiffies = 24142 timer_jiffies = 24141 expires = 25141
| IGNORING: now = 24142 < 25141 = expire time
| jiffies = 24142 timer_jiffies = 24141 expires = 25141
| IGNORING: now = 24142 < 25141 = expire time
| jiffies = 24142 timer_jiffies = 24141 expires = 25141
| IGNORING: now = 24142 < 25141 = expire time
| jiffies = 24142 timer_jiffies = 24141 expires = 25141
| IGNORING: now = 24143 < 25142 = expire time
| jiffies = 24143 timer_jiffies = 24141 expires = 25142
| IGNORING: now = 24143 < 25142 = expire time
| jiffies = 24143 timer_jiffies = 24141 expires = 25142
| IGNORING: now = 24143 < 25142 = expire time
| jiffies = 24143 timer_jiffies = 24141 expires = 25142
| IGNORING: now = 24143 < 25142 = expire time
| jiffies = 24143 timer_jiffies = 24141 expires = 25142
| IGNORING: now = 24143 < 25142 = expire time
| jiffies = 24144 timer_jiffies = 24141 expires = 25142
| IGNORING: now = 24145 < 25145 = expire time
| jiffies = 24145 timer_jiffies = 24141 expires = 25145
| IGNORING: now = 24146 < 25146 = expire time
| jiffies = 24147 timer_jiffies = 24141 expires = 25147
| IGNORING: now = 24153 < 25153 = expire time
| jiffies = 24153 timer_jiffies = 24141 expires = 25153
| IGNORING: now = 24154 < 25153 = expire time
| jiffies = 24154 timer_jiffies = 24141 expires = 25153
| IGNORING: now = 24155 < 25153 = expire time
| jiffies = 24160 timer_jiffies = 24141 expires = 25160
| IGNORING: now = 24161 < 25160 = expire time
| jiffies = 24161 timer_jiffies = 24141 expires = 25160
| jiffies = 24162 timer_jiffies = 24141 expires = 25160
| jiffies = 24163 timer_jiffies = 24141 expires = 25160
| jiffies = 24163 timer_jiffies = 24141 expires = 25160
| jiffies = 24164 timer_jiffies = 24141 expires = 25160
| jiffies = 24164 timer_jiffies = 24141 expires = 25160
| jiffies = 24165 timer_jiffies = 24141 expires = 25160
| ...

Things don't change after jiffies got larger than 25160, but I ran out of
memory for the log (>128K).

The `IGNORING' message is from the test I added to ide_timer_expiry(). IDE uses
timeout values of 10*HZ (i.e. 1000 on my machine).

Second log:

| ...
| jiffies = 30977 timer_jiffies = 30977 expires = 30977
| jiffies = 30979 timer_jiffies = 30979 expires = 30979
| jiffies = 30987 timer_jiffies = 30987 expires = 30987
| jiffies = 31069 timer_jiffies = 31069 expires = 31069
| jiffies = 31075 timer_jiffies = 31075 expires = 31075
| jiffies = 31079 timer_jiffies = 31079 expires = 31079
| jiffies = 31083 timer_jiffies = 31083 expires = 31083
| jiffies = 31085 timer_jiffies = 31085 expires = 31085
| jiffies = 31087 timer_jiffies = 31087 expires = 31087
| jiffies = 31095 timer_jiffies = 31093 expires = 31093
| jiffies = 31108 timer_jiffies = 31108 expires = 31091
| jiffies = 31108 timer_jiffies = 31108 expires = 31112
| jiffies = 31108 timer_jiffies = 31108 expires = 31112
| jiffies = 31108 timer_jiffies = 31108 expires = 31112
| jiffies = 31108 timer_jiffies = 31108 expires = 31112
| jiffies = 31108 timer_jiffies = 31108 expires = 31112
| jiffies = 31108 timer_jiffies = 31108 expires = 31112
| jiffies = 31108 timer_jiffies = 31108 expires = 31112
| jiffies = 31109 timer_jiffies = 31108 expires = 31113
| jiffies = 31109 timer_jiffies = 31108 expires = 31113
| jiffies = 31109 timer_jiffies = 31108 expires = 31113
| jiffies = 31109 timer_jiffies = 31108 expires = 31113
| jiffies = 31109 timer_jiffies = 31108 expires = 31113
| jiffies = 31109 timer_jiffies = 31108 expires = 31113
| jiffies = 31109 timer_jiffies = 31108 expires = 31113
| jiffies = 31110 timer_jiffies = 31108 expires = 31114
| jiffies = 31110 timer_jiffies = 31108 expires = 31114
| jiffies = 31110 timer_jiffies = 31108 expires = 31114
| jiffies = 31110 timer_jiffies = 31108 expires = 31114
| jiffies = 31110 timer_jiffies = 31108 expires = 31114
| jiffies = 31110 timer_jiffies = 31108 expires = 31114
| jiffies = 31110 timer_jiffies = 31108 expires = 31114

This is a bit different: timer_jiffies gets stuck and is no longer incremented.

My machine is an Amiga 4000 with 25 MHz 68040, running Linux/m68k 2.1.115 (with
2.1.117+ IDE). I don't think it's a late 2.1.x problem. I have been seeing the
symptoms since months.

Greetings,

Geert

--
Geert Uytterhoeven                     Geert.Uytterhoeven@cs.kuleuven.ac.be
Wavelets, Linux/{m68k~Amiga,PPC~CHRP}  http://www.cs.kuleuven.ac.be/~geert/
Department of Computer Science -- Katholieke Universiteit Leuven -- Belgium

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html