Re: [PATCH v2] block: ublk: enable zoned storage support

From: Andreas Hindborg
Date: Fri Mar 03 2023 - 03:34:14 EST



Ming Lei <ming.lei@xxxxxxxxxx> writes:

> On Thu, Mar 02, 2023 at 02:28:33PM +0100, Andreas Hindborg wrote:
>>
>> Ming Lei <ming.lei@xxxxxxxxxx> writes:
>>
>> > On Thu, Mar 02, 2023 at 11:07:15AM +0100, Andreas Hindborg wrote:
>> >>
>> >> Ming Lei <ming.lei@xxxxxxxxxx> writes:
>> >>
>> >> > On Thu, Mar 2, 2023 at 5:02 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>> >> >>
>> >> >> On Thu, Mar 02, 2023 at 04:32:21PM +0800, Ming Lei wrote:
>> >> >> > On Thu, Mar 02, 2023 at 08:31:07AM +0100, Andreas Hindborg wrote:
>> >> >> > >
>> >> >>
>> >> >> ...
>> >> >>
>> >> >> > >
>> >> >> > > I agree about fetching more zones. However, it is no good to fetch up to
>> >> >> > > a max, since the requested zone report may less than max. I was
>> >> >> >
>> >> >> > Short read should always be supported, so the interface may need to
>> >> >> > return how many zones in single command, please refer to nvme_ns_report_zones().
>> >> >>
>> >> >> blk_zone is part of uapi, maybe the short read can be figured out by
>> >> >> one all-zeroed 'blk_zone'? then no extra uapi data is needed for
>> >> >> reporting zones.
>> >> >
>> >> > oops, we have blk_zone_report data for reporting zones to userspace already,
>> >> > see blkdev_report_zones_ioctl(), then this way can be re-used for getting zone
>> >> > report from ublk server too, right?
>> >>
>> >> Yes that would be nice. But I did the report_zone command like a read
>> >> operation, so we are not currently copying any buffers to user space
>> >> when issuing the command, we just rely on the iod.
>> >
>> > What I meant is to reuse the format of blk_zone_report for returning
>> > multiple 'blk_zone' info in single command.
>> >
>> > The only change is that you need to allocate one bigger kernel buffer
>> > to hold more 'blk_zone' in single report zone request.
>> >
>> >> I think it would be
>> >> better to use the start_sectors and nr_sectors of the iod instead. Then
>> >> we don't have to copy the blk_zone_report. What do you think?
>> >
>> > For IN parameter of report zone command, you still can reuse
>> > blk_zone_report:
>> >
>> > struct blk_zone_report {
>> > __u64 sector;
>> > __u32 nr_zones;
>> > __u32 flags;
>> > };
>> >
>> > Just by using the 1st two 64b words of iod for holding 'blk_zone_report', and
>> > keep the iod->addr field not touched.
>>
>> I see. Would you make the first part of `struct ublksrv_io_desc` a union
>> for this, or would you just cast it at the use site?
>
> oops, you still need iod->op_flags for recognizing the io op, so just
> start_sector and nr_sectors can be used.

We do not actually need to pass the flags to user space, or back from
user space to kernel for ublk zone report. They are currently used to
tell user space if the zone report contains capacity field. We could
exclude them from the ublk kabi since the zone report will always
contain capacity? But it might be good to have a flags field or future
things.

> However, this way isn't good too, cause UBLK_IO_OP_DRV_IN is just mapped
> to 'report zone' command in your implementation, what if new pt request
> is required in future?

We are currently mapping REQ_OP_* 1:1 to UBLK_OP_OP_*. If we relax
this, we can have a UBLK_IO_OP_REPORT_ZONES.

>
> We need to think about how to support ublk pt request in generic way.

Another option is to allow REQ_OP_DRV_IN to pass a buffer to user space.
Instead of being similar to a read operation, it could be a combination of
a read and a write operation.

BR Andreas