Re: [BUG] Regression introduced with "block: split bios to max possible length"

From: Linus Torvalds
Date: Thu Jan 21 2016 - 20:12:21 EST

On Thu, Jan 21, 2016 at 2:51 PM, Keith Busch <keith.busch@xxxxxxxxx> wrote:
> My apologies for the trouble. I trust it really is broken, but I don't
> quite see how. The patch supposedly splits the transfer to the max size
> the request queue says it allows. How does the max allowed size end up
> an invalid multiple?

I assume that in this case it's simply that

- max_sectors is some odd number in sectors (ie 65535)

- the block size is larger than a sector (ie 4k)

- the device probably doesn't even have any silly chunk size issue
(so chunk_sectors is zero).

and because the block size is 4k, no valid IO can ever generate
anything but 4k-aligned IO's, and everything is fine.

Except now the "split bios" patch will split blindly at the
max_sectors size, which is pure and utter garbage, since it doesn't
take the minimum block size into account.

Also, quite frankly, I think that whole "split bios" patch is garbage *anyway*.

The thing is, the whole "blk_max_size_offset()" use there is broken.
What I think it _should_ do is:

(a) check against max sectors like it used to do:

if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
goto split;

(b) completely separately, and _independently_ of that max sector
check, it should check against the "chunk_sectors" limit if it exists.

instead, it uses that nasty blk_max_size_offset() crap, which is
broken because it's confusing, but also because it doesn't honor
max_sectors AT ALL if there is a chunking size.

So I think chunking size should be independent of max_sectors. I could
see some device that has some absolute max sector size, but that
_also_ wants to split so that the bio never crosses a particular chunk
size (perhaps due to RAID, perhaps due to some internal device block
handling rules).

Trying to mix the two things with those "blk_max_size_offset()" games
is just wrong.