2.2.14 SMP add_wait_queue problem

From: V Ganesh (ganesh@veritas.com)
Date: Thu Apr 06 2000 - 02:19:13 EST


From: prasanna@veritas.com (Prasanna Narayana)
Hi,

Do you mind posting this to linux-kernel email list ?
Somehow we are unable to do that.

-- Prasanna
---------------------------- CUT HERE -----------------------------------
Subject: 2.2.14 SMP add_wait_queue problem

Hi,

We are facing a hard hang on 2.2.14 SMP kernel while doing heavy
multi-threaded i/o using our software raid driver.

We have 10 kernel threads which mostly do

    get my_spinlock (spin_lock_irqsave)
    do some work
    add_wait_queue
    release my_spinlock
    schedule()
    get my_spinlock
    remove_wait_queue

    and repeat the cycle.

These are woken up from the i/o done path.

Within a few seconds machine locks up and nothing works. By putting
printk statements, we have found that

cpu 1 cpu 2
----- ------
    * get my_spinlock (spin_lock_irqsave) * waiting for my_spinlock
    * do some work with spin_lock_irqsave()
    * trying to do add_wait_queue which also means disabled
      but has not completed interrupts.
      probably because of not
      getting waitqueue_lock

Apart from add_wait_queue(), waitqueue_lock is used only
by remove_wait_queue() and __wake_up(). So why doesn't the thread
running on cpu 1 get this lock when cpu 2 is not executing
either remove_wait_queue() or __wake_up() ?

This is not a problem on UP and works ok on
2.3.xx, (wait queue implementation is different there).
Also, it does not look like a hardware problem as we see this
in 3 different machines.

-- Prasanna

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:16 EST