Re: [PATCH v3] sg: O_EXCL and other lock handling

From: Douglas Gilbert
Date: Thu Nov 14 2013 - 11:19:22 EST


On 13-11-12 07:58 PM, Douglas Gilbert wrote:
After feedback on version 2 and a new report of a failure
in the vicinity of sg_remove() [remove device] during
a shutdown on a large machine, the locking has been
revised again.

The shutdown problem in the vicinity of sg_remove() has
been traced to the st driver and a patch to fix st has
been sent to this list. So there are now no reported
problems against this patch.

Doug Gilbert

ChangeLog v3:
- change Sg_device::exclude and detached (renamed to
detaching) to atomic_t
- introduce atomic_t Sg_device::open_cnt and use for
open(O_EXCL) logic. Hence stop using list_empty(sfds)
which decouples the open/release logic from
sg_remove_device() and other post-release cleanup
functions
- use a mutex to stop races between sg_open() and
sg_release() on the same device
- reduce the use of driver wide sg_index_lock so now
it only protects sg_index_idr (the device array)
- expand cleanups requested by checkpatch.pl to the
remaining code in the driver

ChangeLog v2:
- favour non O_EXCL open()s over open(dev, O_EXCL)s
- wake all open(dev)s if dev is removed (detached)
- wake all read(dev_fd)s that are waiting for a response
if dev is removed (detached)
- other cleanups requested by checkpatch.pl

ChangeLog v1:
- introduce a finer grain (per device) lock to protect
access and changes to the file descriptor objects
- introduce a semaphore for mutual exclusion of co-incident
open and release calls to the same device
- improve the O_EXCL handling of sg_open() when multiple
callers are waiting for an O_EXCL condition to clear
- change some seq_printf()s to seq_puts()s as requested
by checkpatch.pl
- update copyright notice, version number and date


The patch is against lk 3.12.0 (and should work on lk 3.10
and lk 3.11 as the sg driver hasn't changed).

Testing is ongoing (see the v2 post) with focus on host
removal and shutdown. The driver survives bombarding 4 LUs
with queued requests spread across 6000 scsi_debug LUs.
Some log noise is generated, but it is not from the sg
driver:
scsi 9:0:33:3: rejecting I/O to offline device
scsi 9:0:33:3: [sg1000] killing request
<multiple times>

This is not seen when there are only 600 LUs.


Signed-off-by: Douglas Gilbert <dgilbert@xxxxxxxxxxxx>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/