[PATCH scsi-misc-2.6 00/08] scsi: small fixes & cleanups

From: Tejun Heo
Date: Tue Mar 22 2005 - 21:18:32 EST


Hello, James.
Hello, Jens.

These are series of small fixes & cleanups. The last two patches
deal with reference counting and hot unplugging oops. Patches are
against scsi-misc-2.6 tree (this is the devel tree, right?).

Jens, please try #08 and tell me if you still get oops. AFAICT,
reference counting isn't the issue here. We're over-counting not
under-counting (see description of #07).

All compile cleanly and I haven't found any problem yet. USB
hot-unplugging doesn't create oops or offline device anymore for me.

[ Start of patch descriptions ]

01_scsi_remove_scsi_release_buffers.patch
: remove unused bounce-buffer release path

Buffer bouncing hasn't been done inside the scsi midlayer for
quite sometime now, but bounce-buffer release paths are still
around. This patch removes these unused paths.

02_scsi_no_special_on_requeue.patch
: don't use blk_insert_request() for requeueing

blk_insert_request() has 'reinsert' argument, which, when set,
turns on REQ_SPECIAL and REQ_SOFTBARRIER and requeues the
request. SCSI midlayer was the only user of this feature and
all requeued requests become special requests defeating
quiesce state. This patch makes scsi midlayer use
blk_requeue_request() for requeueing and removes 'reinsert'
feature from blk_insert_request().

Note: In drivers/scsi/scsi_lib.c, scsi_single_lun_run() and
scsi_run_queue() are moved upward unchanged.

03_scsi_remove_internal_timeout.patch
: remove unused scsi_cmnd->internal_timeout field

scsi_cmnd->internal_timeout field doesn't have any meaning
anymore. Kill the field.

04_scsi_remove_volatile.patch
: remove meaningless volatile qualifiers from structure definitions

scsi_device->device_busy, Scsi_Host->host_busy and
->host_failed have volatile qualifiers, but the qualifiers
don't serve any purpose. Kill them. While at it, protect
->host_failed update in scsi_error for consistency and clarity.

05_scsi_timer_cleanup.patch
: remove a timer race from scsi_queue_insert() and cleanup timer

scsi_queue_insert() has four callers. Three callers call with
timer disabled and one (the second invocation in
scsi_dispatch_cmd()) calls with timer activated.
scsi_queue_insert() used to always call scsi_delete_timer()
and ignore the return value. This results in race with timer
expiration. Remove scsi_delete_timer() call from
scsi_queue_insert() and make the caller delete timer and check
the return value.

While at it, as, once a scsi timer is added, it should expire
or be deleted before reused, make scsi_add_timer() strict
about timer reuses. Now timer expiration function clears
->function and all timer deletion should go through
scsi_delete_timer(). Also, remove bogus ->eh_action tests
from scsi_eh_{done|times_out} functions. The condition is
always true and the test is somewhat misleading.

06_scsi_remove_serial_number_at_timeout.patch
: remove meaningless scsi_cmnd->serial_number_at_timeout field

scsi_cmnd->serial_number_at_timeout doesn't serve any purpose
anymore. All serial_number == serial_number_at_timeout tests
are always true in abort callbacks. Kill the field. Also, as
->pid always equals ->serial_number and ->serial_number
doesn't have any special meaning anymore, update comments
above ->serial_number accordingly. Once we remove all uses of
this field from all lldd's, this field should go.

07_scsi_refcnt_cleanup.patch
: remove bogus {get|put}_device() calls

SCSI request submission paths can be categorized like the
following.

* through high-level driver (sd, st, sg...)
+ requests (fs / pc)
+ ioctls
+ flushes (issue_flush / barrier rqs)
+ backing dev (unplug fn / field referencing)
+ high-level specific (init / revalidation...)
* through scsi-midlayer
+ midlevel specific (init...)
+ transport specific (domain validations...)

All accesses either

* open high-level driver before submitting requests and
closes with no request left.
* get_device() before submitting requests and put_device()
with no request left.

So, basically, SCSI high-level object (scsi_disk) and
mid-level object (scsi_device) are reference counted by users,
not the requests they submit. Reference count cannot go zero
with active users and users cannot access the object once the
reference count reaches zero.

So, the {get/put}_device() calls in scsi_get_command() and
scsi_request_fn() are bogus and misleading. In addition,
get_device() cannot synchronize 1->0 and 0->1 transitions and
always returns the device pointer given as the argument. The
== NULL tests are just misleading.

08_scsi_hot_unplug_fix.patch
: fix hot unplug sequence

When hot-unplugging using scsi_remove_host() function (as usb
does), scsi_forget_host() used to be called before
scsi_host_cancel(). So, the device gets removed first without
request cleanup and scsi_host_cancel() never gets to call
scsi_device_cancel() on the removed devices. This results in
premature completion of hot-unplugging process with active
requests left in queue, eventually leading to hang/offlined
device or oops when the active command times out.

This patch makes scsi_remove_host() call scsi_host_cancel()
first such that the host is first transited into cancel state
and all requests of all devices are killed, and then, the
devices are removed. This patch fixes the oops in eh after
hot-unplugging bug.

[ End of patch descriptions ]

Thanks a lot.

--
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/