Re: ide_do_drive_cmd() problems

Gadi Oxman (gadio@netvision.net.il)
Sun, 12 Sep 1999 18:31:23 +0300 (IDDT)


Hi,

The reason we added the cli() to ide_do_drive_cmd() before the down()
call originally (in 1.3.something, before Linux had SMP support), was
to minimize the time inside down() in which an interrupt completion
of the just queued request could call up() and race with us calling
down().

On a UP kernel, down(), eventually calling schedule, was performing an
implicit (single processor) sti() for us at a later stage. So if the
drive was able to complete the request immediately, the completion
interrupt would have been delayed a bit until that point of the implicit
sti().

On a SMP kernel, a global cli() does a local processor __cli(), and grabs
the global irq lock. down(), calling schedule(), would eventually release
the irq lock before selecting another process.

If, however, somewhere between the call to down() and the release of
the irq lock, someone calls local __sti() *and* unluckily, the request
could have been serviced by the drive quickly, we would attempt to grab
the irq lock on the processor which already owns it and deadlock.

In the path called by down(), I can see the run_task_queue() call,
which is called by schedule() before the release_kernel_lock() call.
This call, for example, can call unplug_device() for another IDE
interface which has queued requests, and there we would do an __sti()
in the ide_do_request() function.. Does moving the run_task_queue()
call after the release_kernel_lock() call makes any difference with
the original ide_do_drive_cmd()?

BTW, our attempt to minimize the race potential between down() and
up() is probably no longer needed with current kernels, as the semaphores
are SMP and interrupt safe. The important part is to install the
semaphore in the request before grabbing the io_request_lock and adding
it to the queue, and we should safely be able to remove the save_flags(),
cli() (both global and local) and restore_flags() around the call to
down(), and let the standard kernel up()/down() functions handle the
race between them.

On another point, the use of:

struct request *cur_rq = drive->queue;

in the rewritten function below outside of the io_request_lock()
part is dangerous, as an interrupt could occur just between the
above line and the spin_lock_irqsave(&io_request_lock, flags); line
and change cur_rq from under us.

Gadi

On Sun, 12 Sep 1999, Jens Axboe wrote:

> Hi all,
>
> I've been having problems with the above command in the
> ATAPI driver - it can cause complete lockups, no magic
> keys, just dead. I have been unable to find out what
> is causing these problems, so I wrote a replacement
> function. Why this fixes the problem I don't know?!
> Hints appreciated!
>
> /*
> * Like ide_do_drive_cmd(), but with ide_wait as ide_action. Only
> * disables interrupts locally.
> */
> int ide_do_drive_cmd_wait (ide_drive_t *drive, struct request *rq)
> {
> unsigned long flags;
> ide_hwgroup_t *hwgroup = HWGROUP(drive);
> unsigned int major = HWIF(drive)->major;
> struct request *cur_rq = drive->queue;
> struct semaphore sem = MUTEX_LOCKED;
>
> #ifdef CONFIG_BLK_DEV_PDC4030
> if (HWIF(drive)->chipset == ide_pdc4030 && rq->buffer != NULL)
> return -ENOSYS; /* special drive cmds not supported */
> #endif
>
> rq->errors = 0;
> rq->rq_status = RQ_ACTIVE;
> rq->rq_dev = MKDEV(major,(drive->select.b.unit)<<PARTN_BITS);
> rq->sem = &sem;
>
> spin_lock_irqsave(&io_request_lock, flags);
> if (cur_rq == NULL) {
> rq->next = cur_rq;
> drive->queue = rq;
> } else {
> while (cur_rq->next != NULL) /* find end of list */
> cur_rq = cur_rq->next;
> rq->next = cur_rq->next;
> cur_rq->next = rq;
> }
> spin_unlock_irqrestore(&io_request_lock, flags);
> do_hwgroup_request(hwgroup);
> __save_flags(flags);
> __cli();
> if (rq->rq_status != RQ_INACTIVE)
> down(&sem);
> __restore_flags(flags);
> return rq->errors ? -EIO : 0;
> }
>
> --
> * Jens Axboe <axboe@image.dk>
> * Linux CD-ROM Maintainer
> * http://www.kernel.dk
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/