Re: [patch] SMP race fix [was Re: SMP lockup & 3c509 on 2.2.x [aka. the Deadly 'ping -f']]

Linus Torvalds (torvalds@transmeta.com)
Wed, 5 May 1999 08:23:02 -0700 (PDT)


On Wed, 5 May 1999, Andrea Arcangeli wrote:
>
> Ah! The code above is showing some bad design of the IRQ_INPROGRESS stuff.

Andrea, I think you're just guessing at what's going on again. Don't
just guess.

IRQ_INPROGRESS works the way it works on purpose: it's only cleared after
a successful execution of the irq handler - and it is left pending if
there was no irq handler etc..

Your "while ()" was not reliable, and that's the part that needs fixing.
For example, your while patch will loop forever if an irq tries to
synchronize with itself (instead of just returning like the original code
did). Don't try to fix it up by changing code that works fine.

> This my second incremental patch make sure to have a relialable
> IRQ_INPROGRESS value, to allow us to safely loop while(in_progress) at the
> end of disable_irq().

See above why it's not safe at all, even with your changes.

> Really with this new patch I removed the need to clear the IRQINPROGRESS
> bit upon irq_enable() and so I think this patch alone would just fix the
> SMP race (the problem is that we was doing a enable_irq while the irq was
> running and so we was allowing a second irq to reenter nested in the irq
> handler).

No.

If people are doing random enable_irq() calls, then everything will break
anyway.

"enable_irq()" is only valid after a "disable_irq()" has been called.
That's basically enforced by the "disable counter" setup, but it also
makes sense.

And "disable_irq()" will synchronously wait for the irq handler to have
exited (it has to - otherwise you'd see the case where somebody calls
disable_irq(), yet another CPU is still busy running that irq). So you
wouldn't see any nested irq handlers. Think about it.

Now, I fully expect that there can be (and is) a bug somewhere there, but
the way to fix it is not to make the above kind of guesses.

> Now I have to shutdown the computer (I had to go to University ...), I'll
> continue to think about the problem in the late evening.

Please do.

I generally have a problem with your patches: a lot of them are really
good. But at the same time a lot of them are changing things because
you're in "random walk" mode, and you just walk over the code without
really thinking about them. In the end, you then have a huge patch, and
nobody really knows which part of your patch is really the worthwhile
part. Which is then why it takes so long for some of your _real_ fixes to
get into a stable kernel..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/