Re: [PATCH 03/11] block: add rq->resid_len
From: Tejun Heo
Date:  Mon May 11 2009 - 20:24:00 EST
Hello, all.
James Bottomley wrote:
>>> Does resid_len make any sense w/ failed requests?  I think we would be
>>> better off with declaring residual count to be undefined on request
>>> failure.  Is there any place which depends on it?
>> IIRC, I wrote the code. I think that this doesn't matter but it's
>> better not to change the behavior unless Eric ack on this change
>> (maybe LSI has some management binary that assume this behavior though
>> it's unlikely).
> 
> Actually, yes it does, for many possible reasons.
> 
> The first being if the device is too stupid to report an actual sector
> location the next best way of determining where the error occurred is
> from the residual.  We don't make use of this in kernel (perhaps we
> should?) but some of the user space programs for CD/DVD burning do.
Really?  Residual count on command success is used but on failure?
That's a dangerous territory.  When a SG_IO fails, the only data the
app should be accessing is the sense data if the status indicates its
validity.  The problems with residual count on failed command are...
* Not well defined.  What does it mean really?  It can't indicate
  successful partial transfer.  If the request partially succeeded,
  the required behavior is to successfully complete the request
  partially with residual count and then fail the latter part when
  issued again.  If the failure applies to the whole request but
  location information is useful, it should be carried in the sense
  data.
* What about corner values?  What does 0 or full resid count on
  failure mean?
* Different layers of failing.  In SG_IO interface, a request may fail
  with -EIO way before it reaches block layer.  Residual count can't
  be set to any meaningful value in these cases.  We can set it to
  full count for these fast fail paths, but do we really wanna go
  there?  Another problem is when a driver is missing SG_IO
  capability.  Who's responsible for setting resid count in that case?
  How is upper layer gonna determine a SG_IO failed because lower
  level driver didn't support it or it genuinely failed?
I think it's just silly to give any meaning to resid count when the
request fails.  It's best to leave the field unmodified or just
declare it undefined.
Thanks.
-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/