Re: IDE tape and SMP

Gadi Oxman (gadio@netvision.net.il)
Thu, 29 Oct 1998 08:12:31 +0200 (IST)


Hi Mark,

> Gadi Oxman wrote:
> ...
> > I'm thinking about the latest ide-tape + SMP problems; it's hard for
> > me to fix, as I'm not very familiar with 2.1.x SMP. Mark/Scott/Erik and
> > myself developed the drivers without access to SMP systems, and although
> > I upgraded to a SMP system very recently, unfortunately I was not able
> > to repeat the latest SMP problems on my system yet.
> ...
> I am not going to get into this too much,
> but I would guess that the ide-tape *and* ide-cd *and* ide-floppy
> stuff should already be darned-near SMP-safe, since the entry
> points are all (?) through the SMP firewall code in ide.c.
>
> The intent of the fine-grained SMP support I added in ide.c
> was that it would protect all of the sub-drivers, provided they
> don't go out of their way to be foolish (except for ide-scsi).
>
> Interrupt masking shouldn't really be needed or matter in the
> sub-drivers.. most of it can probably just be "#if 0"'d .
>
> Are there actually reported problems (I wouldn't know)?
> Do they go away when the save_flags/cli/restore_flags is commented-out
> in ide-tape.c ?
> --
> mlord@pobox.com

The ide-tape + SMP problem reported by Matthew Hunter was that
after about one hour of backup, the backup process would lock
in the 'D' state.

Andre Hedrick and Alex Buell reported seemingly SMP related
irq timeouts and DSC timeouts.

I'm not sure whether the second case is SMP related, but I can
believe that the first is.

ide-tape implements a character device it is both a request-servicer
and request-generator driver, where ide-floppy, for example, is mainly
just a request-servicer driver, and the request-generator is the
buffer cache (except for the ioctl path).

This is the origin of the race conditions -- ide-tape needs to
halt the request servicing for a while when looking at the queue.
It is using something like:

Going to add a new request to the queue
disable interrupts, preventing requests from being processed
if the queue is full, install a semaphore in the next to be
serviced request and sleep on it.

The cli() here is crucial even for UP systems -- if we don't use it,
we can get an interrupt and complete the request after we verified that
the queue is full and before we installed the semaphore, and the
backup process will be locked in down(), waiting forever for a request
which was already completed.

cli() doesn't seem to disable interrupts on all processors in SMP system
if local interrupts were already disabled. This looks like a very strange
SMP semantics.

I thought about replacing the cli() by spin_lock_irqsave(&hwgroup->spinlock)
it the request-generator part of the driver but I'm not sure if this is
enough yet, since the interrupt handler is called without the spinlock,
and we might have to acquire it there as well.

Gadi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/