Re: block: wrong return value by bio_end_sector?
From: Paolo Valente
Date: Sun Oct 02 2022 - 12:20:55 EST
> Il giorno 1 ott 2022, alle ore 02:50, Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> ha scritto:
>
> On 10/1/22 00:59, Paolo Valente wrote:
>> Hi Jens, Damien, all other possibly interested people, this is to raise
>> attention on a mistake that has emerged in a thread on a bfq extension
>> for multi-actuary drives [1].
>>
>> The mistake is apparently in the macro bio_end_sector (defined in
>> include/linux/bio.h), which seems to be translated (incorrectly) as
>> sector+size, and not as sector+size-1.
>
> This has been like this for a long time, I think.
>
>>
>> For your convenience, I'm pasting a detailed description of the
>> problem, by Tyler (description taken from the above thread [1]).
>>
>> The drive reports the actuator ranges as a starting LBA and a count of
>> LBAs for the range. If the code reading the reported values simply does
>> startingLBA + range, this is an incorrect ending LBA for that actuator.
>
> Well, yes. LBA 0 + drive capacity is also an incorrect LBA. If the code
> assumes that it is, you have a classic off-by-one bug.
>
>> This is because LBAs are zero indexed and this simple addition is not
>> taking that into account. The proper way to get the endingLBA is
>> startingLBA + range - 1 to get the last LBA value for where to issue a
>> final IO read/write to account for LBA values starting at zero rather
>> than one.
>
> Yes. And ? Where is the issue ?
>
>>
>> Here is an example from the output in SeaChest/openSeaChest:
>> ====Concurrent Positioning Ranges====
>>
>> Range# #Elements Lowest LBA # of LBAs 0
>> 1 0
>> 17578328064 1 1 17578328064
>> 17578328064
>>
>> If using the incorrect formula to get the final LBA for actuator 0, you
>> would get 17578328064, but this is the starting LBA reported by the
>> drive for actuator 1. So to be consistent for all ranges, the final LBA
>> for a given actuator should be calculated as starting LBA + range - 1.
>>
>> I had reached out to Seagate's T10 and T13 representatives for
>> clarification and verification and this is most likely what is causing
>> the error is a missing - 1 somewhere after getting the information
>> reported by the device. They agreed that the reporting from the drive
>> and the SCSI to ATA translation is correct.
>>
>> I'm not sure where this is being read and calculated, but it is not an
>> error in the low-level libata or sd level of the kernel. It may be in
>> bfq, or it may be in some other place after the sd layer. I know there
>> were some additions to read this and report it up the stack, but I did
>> not think those were wrong as they seemed to pass the drive reported
>> information up the stack.
>>
>> Jens, Damien, can you shed a light on this?
>
> I am not clear on what the problem is exactly. This all sound like a
> simple off-by-one issue if bfq support code. No ?
Yes, it's (also) in bfq code now; no, it's not only in bfq code.
The involved bfq code is a replica of the following original function
(living in block/blk-iaranges.c):
static struct blk_independent_access_range *
disk_find_ia_range(struct blk_independent_access_ranges *iars,
sector_t sector)
{
struct blk_independent_access_range *iar;
int i;
for (i = 0; i < iars->nr_ia_ranges; i++) {
iar = &iars->ia_range[i];
if (sector >= iar->sector &&
sector < iar->sector + iar->nr_sectors)
return iar;
}
return NULL;
}
bfq's replica simply contains also this warning, right before the return NULL:
WARN_ONCE(true,
"bfq_actuator_index: bio sector out of ranges: end=%llu\n",
end);
So, if this warning is triggered, then the sector does not belong to
any range.
That warning gets actually triggered [1], for a sector number that is
larger than the largest extreme (iar->sector + iar->nr_sectors) in the
iar data structure. The offending value resulted to be simply equal
to the largest possible value accepted by the iar data structure, plus
one.
So, yes, this is an off-by-one error. More precisely, there is a
mismatch between the above code (or the values stored the iar data
structure) and the value of bio_end_sector (the latter is passed as
input to the above function). bio_end_sector does not seem to give
the end sector of a request (equal to begin+size-1), as apparently
expected by the above code, but the sector after the last one (namely,
begin+size).
Should I express an opinion on where the error is, I would say that
the mistake is in bio_end_sector. But I could be totally wrong, as
I'm not a expert of either of the involved parts. In addition,
bio_end_sector is already in use, with its current formula, by other
parts of the block layer. If you guys (Damien, Jens, Tyler?, ...)
give us some guidance on what to fix exactly, we will be happy to make
a fix.
Thanks,
Paolo
>
>>
>> Thanks, Paolo
>>
>> [1] https://www.spinics.net/lists/kernel/msg4507408.html
>
> --
> Damien Le Moal
> Western Digital Research