Re: [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support

From: Eric Blake
Date: Wed Apr 03 2024 - 13:02:37 EST


On Wed, Apr 03, 2024 at 10:11:47AM -0400, Stefan Hajnoczi wrote:
..
> > > +static loff_t dm_blk_do_seek_hole_data(struct dm_table *table, loff_t offset,
> > > + int whence)
> > > +{
> > > + struct dm_target *ti;
> > > + loff_t end;
> > > +
> > > + /* Loop when the end of a target is reached */
> > > + do {
> > > + ti = dm_table_find_target(table, offset >> SECTOR_SHIFT);
> > > + if (!ti)
> > > + return whence == SEEK_DATA ? -ENXIO : offset;
> >
> > ...but this blindly returns offset for SEEK_HOLE, even when offset is
> > beyond the end of the dm. I think you want 'return -ENXIO;'
> > unconditionally here.
>
> If the initial offset is beyond the end of the table, then SEEK_HOLE
> should return -ENXIO. I agree that the code doesn't handle this case.
>
> However, returning offset here is correct when there is data at the end
> with SEEK_HOLE.
>
> I'll update the code to address the out-of-bounds offset case, perhaps
> by checking the initial offset before entering the loop.

You are correct that if we are on the second loop iteration of
SEEK_HOLE (because the first iteration saw all data), then we have
found the offset of the start of a hole and should return that offset,
not -ENXIO. This may be a case where we just have to be careful on
whether the initial pass might have any corner cases different from
later times through the loop, and that we end the loop with correct
results for both SEEK_HOLE and SEEK_DATA.

>
> >
> > > +
> > > + end = (ti->begin + ti->len) << SECTOR_SHIFT;
> > > +
> > > + if (ti->type->seek_hole_data)
> > > + offset = ti->type->seek_hole_data(ti, offset, whence);
> >
> > Are we guaranteed that ti->type->seek_hole_data will not return a
> > value exceeding end? Or can dm be used to truncate the view of an
> > underlying device, and the underlying seek_hold_data can now return an
> > answer beyond where dm_table_find_target should look for the next part
> > of the dm's view?
>
> ti->type->seek_hole_data() must not return a value larger than
> (ti->begin + ti->len) << SECTOR_SHIFT.

Worth adding as documentation then.

>
> >
> > In which case, should the blkdev_seek_hole_data callback be passed a
> > max size parameter everywhere, similar to how fixed_size_llseek does
> > things?
> >
> > > + else
> > > + offset = dm_blk_seek_hole_data_default(offset, whence, end);
> > > +
> > > + if (whence == SEEK_DATA && offset == -ENXIO)
> > > + offset = end;
> >
> > You have a bug here. If I have a dm contructed of two underlying targets:
> >
> > |A |B |
> >
> > and A is all data, then whence == SEEK_HOLE will have offset = -ENXIO
> > at this point, and you fail to check whether B is also data. That is,
> > you have silently treated the rest of the block device as data, which
> > is semantically not wrong (as that is always a safe fallback), but not
> > optimal.
> >
> > I think the correct logic is s/whence == SEEK_DATA &&//.
>
> No, with whence == SEEK_HOLE and an initial offset in A, the new offset
> will be (A->begin + A->end) << SECTOR_SHIFT. The loop will iterate and
> continue seeking into B.
>
> The if statement you commented on ensures that we also continue looping
> with whence == SEEK_DATA, because that would otherwise prematurely end
> with the new offset = -ENXIO.
>
> >
> > > + } while (offset == end);
> >
> > I'm trying to make sure that we can never return the equivalent of
> > lseek(dm, 0, SEEK_END). If you make my above suggested changes, we
> > will iterate through the do loop once more at EOF, and
> > dm_table_find_target() will then fail to match at which point we do
> > get the desired -ENXIO for both SEEK_HOLE and SEEK_DATA.
>
> Wait, lseek() is supposed to return the equivalent of lseek(dm, 0,
> SEEK_END) when whence == SEEK_HOLE and there is data at the end.

It was confusing enough for me to write my initial review, I apologize
if I'm making it harder for you. Yes, we want to ensure that:

off1 = lseek(fd, -1, SEEK_END);
off2 = off1 + 1; // == lseek(fd, 0, SEEK_END)

if off1 belongs to a data extent:
- lseek(fd, off1, SEEK_DATA) == off1
- lseek(fd, off1, SEEK_HOLE) == off2
- lseek(fd, off2, SEEK_DATA) == -ENXIO
- lseek(fd, off2, SEEK_HOLE) == -ENXIO

if off1 belongs to a hole:
- lseek(fd, off1, SEEK_DATA) == -ENXIO
- lseek(fd, off1, SEEK_HOLE) == off1
- lseek(fd, off2, SEEK_DATA) == -ENXIO
- lseek(fd, off2, SEEK_HOLE) == -ENXIO

Anything in my wall of text from the earlier message inconsistent with
this table can be ignored; but at the same time, I was not able to
quickly convince myself that your code properly had those properties,
even after writing up the table.

Reiterating what I said elsewhere, it may be smarter to document that
for callbacks, it is wiser to require intermediate behavior that the
input value 'offset' is always between the half-open range
[ti->begin<<SECTOR_SHIFT, (ti->begin+ti->len)<<SECTOR_SHIFT), and on
success, the output must be in the fully-closed range [offset,
(ti->begin+ti->len)<<SECTOR_SHIFT], errors like -EIO are permitted but
-ENXIO should not be returned; and let the caller worry about
synthesizing -ENXIO from that (since the caller knows whether or not
there is a successor ti where adjacency concerns come into play).

That is, we can never pass in off2 (beyond the bounds of the table),
and when passing in off1, I think this interface may be easier to work
with in the intermediate layers, even though it differs from the
lseek() interface above. For off1 in data:
- dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off1
- dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off2
and for a hole:
- dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off2
- dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off1

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org