Re: [PATCH v3 0/6] block: fix integrity offset/length conversions

From: Caleb Sander Mateos

Date: Fri Apr 17 2026 - 11:11:00 EST


On Thu, Apr 16, 2026 at 8:26 PM Martin K. Petersen
<martin.petersen@xxxxxxxxxx> wrote:
>
>
> Hi Caleb!
>
> > The block layer's integrity code currently sets the seed (initial
> > reference tag) in units of 512-byte sectors but increments it in units
> > of integrity intervals
>
> I don't necessarily agree with the premise that the seed needs to be
> expressed in any particular unit. The seed is a start value, nothing
> more.

NVM Command Set specification 1.1 section 5.3.3 requires the reference
tag to increment by 1 per logical block, so that seems to determine
the increment unit:

> If the Reference Tag Check bit of the PRCHK field is set to ‘1’ and the namespace is
> formatted for Type 1 or Type 2 protection, then the controller compares the Logical Block
> Reference Tag to the computed reference tag. The computed reference tag depends on
> the Protection Type:
> ▪ If the namespace is formatted for Type 1 protection, the value of the computed
> reference tag for the first logical block of the command is the value contained in
> the Initial Logical Block Reference Tag (ILBRT) or Expected Initial Logical Block
> Reference Tag (EILBRT) field in the command, and the computed reference tag is
> incremented for each subsequent logical block. The controller shall complete the
> command with a status of Invalid Protection Information if the ILBRT field or the
> EILBRT field does not match the value of the least significant bits of the SLBA field
> sized to the number of bits in the Logical Block Reference Tag (refer to section
> 5.3.1.4).
> Note: Unlike SCSI Protection Information Type 1 protection which implicitly uses
> the least significant four bytes of the LBA, the controller always uses the ILBRT or
> EILBRT field and requires the host to initialize the ILBRT or EILBRT field to the
> least significant bits of the LBA sized to the number of bits in the Logical Block
> Reference Tag when Type 1 protection is used.
> ▪ If the namespace is formatted for Type 2 protection, the value of the computed
> reference tag for the first logical block of the command is the value contained in
> the Initial Logical Block Reference Tag (ILBRT) or Expected Initial Logical Block
> Reference Tag (EILBRT) field in the command, and the computed reference tag is
> incremented for each subsequent logical block.

The ref tag used for a particular block needs to be consistent. And
since reftag(block N) can be computed as the reftag(M) + N - M if
block N is accessed as part of an I/O that begins at block M, the
function must be of the form reftag(block N) = N + c for some constant
c. Thus, the ref tag seed needs to be computed in units of logical
blocks (integrity intervals); no other unit (e.g. 512-byte sectors)
works.
To see the issue with the current approach, consider an example
accessing LBA 1 on a device with a 4 KB block size. If the block is
written as part of a write that begins at LBA 0, its ref tag in the
generated PI will be 1 (sector 0 + 1 integrity interval). If it's
later read by a read starting at LBA 1, its expected ref tag will be 8
(sector 8 + 0 integrity intervals), and the auto-integrity code will
fail the read due to a reftag mismatch. This seems completely
unworkable for a block storage device.

>
> We happen to set it to the block number in the block layer since we need
> to be able to know what to compare against on completion (for Type 1 +
> the restrictive Linux implementation of Type 2). But that does not imply
> that the seed needs to be specified in any particular unit. Submitters
> set the seed to whichever value makes sense to them (i.e. it could be
> the offset within a file as opposed to the eventual LBA on the backend

I agree, the seed doesn't need to match the final LBA, but it does
need to be in *units* of logical blocks, plus some constant offset.

> device). And then that seed is incremented by 1 for each integrity
> interval of data in the PI sent to/received from the device. The
> conversion between the submitter's view of what the first ref tag should
> be (i.e. seed) and what is required by the hardware (for instance lower
> 32 bits of device LBA) is the reason we perform remapping. The seed is
> intentionally different in the submitter's protection envelope compared
> to the device's protection envelope.
>
> Using the block layer block number as seed was just a convenience since
> that provided a predictable value for any I/O that had its PI
> autogenerated. I never intended for the actual LBA to be used as seed
> value on a 4Kn device. Initially we just used 0 as the seed. Leveraging
> the block number just added a bit of additional protection.
>
> I confess I haven't tested 4Kn in a while since things sort of converged
> on 512e. But I used to run nightly tests on a SCSI storage with 4Kn
> blocks just fine.
>
> > This looks to be a longstanding bug affecting block devices that
> > support integrity with block sizes > 512 bytes; I'm surprised it
> > wasn't noticed before.
>
> Are you seeing this with NVMe or SCSI?

With a ublk device. It should affect any block device that supports
integrity and has a logical block size > 512.

Best,
Caleb