On Tue, 2005-04-19 at 23:31 +0900, Tejun Heo wrote:
scmd->eh_timeout is used to resolve the race between command
completion and timeout. However, during error handling,
scsi_send_eh_cmnd uses scmd->eh_timeout. This creates a race
condition between eh and normal completion for a request which
has timed out and in the process of error handling. If the
request completes while scmd->eh_timeout is being used by eh,
eh timeout is lost and the command will be handled by both eh
and completion path. This patch fixes the race by making
scsi_send_eh_cmnd() use its own timer.
This patch adds shost->eh_timeout field. The name of the
field equals scmd->eh_timeout which is used for normal command
timeout. As this can be confusing, renaming scmd->eh_timeout
to something like scmd->cmd_timeout would be good.
Reworked such that timeout race window is kept at minimal
level as pointed out by James Bottomley.
This looks fine in principle. However, three comments
1. If you're doing this, there's no further use for eh_timeout, so
remove it (and preferably fix gdth_proc.c; however, it's better to break
the compile of that driver than have it rely on a now defunct field).
2. Use of eh_action is private to scsi_error.c, so you don't need to add
a new field to the host, just make eh_action a pointer to a private
eh_action structure which contains the timer and the semaphore.
3. To close a really tiny window where the running timer could race with
the del_timer, it should probably be del_timer_sync(). The practical
effect of this is nil, but it would be correct programming.