Re: [PATCH 1/2] nvme: set io-scheduler requirement for ZNS

From: Kanchan Joshi
Date: Wed Aug 19 2020 - 06:32:30 EST


On Wed, Aug 19, 2020 at 3:08 PM Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote:
>
> On 2020/08/19 18:27, Kanchan Joshi wrote:
> > On Tue, Aug 18, 2020 at 12:46 PM Christoph Hellwig <hch@xxxxxx> wrote:
> >>
> >> On Tue, Aug 18, 2020 at 10:59:35AM +0530, Kanchan Joshi wrote:
> >>> Set elevator feature ELEVATOR_F_ZBD_SEQ_WRITE required for ZNS.
> >>
> >> No, it is not.
> >
> > Are you saying MQ-Deadline (write-lock) is not needed for writes on ZNS?
> > I see that null-block zoned and SCSI-ZBC both set this requirement. I
> > wonder how it became different for NVMe.
>
> It is not required for an NVMe ZNS drive that has zone append native support.
> zonefs and upcoming btrfs do not use regular writes, removing the requirement
> for zone write locking.

I understand that if a particular user (zonefs, btrfs etc) is not
sending regular-write and sending append instead, write-lock is not
required.
But if that particular user or some other user (say F2FS) sends
regular write(s), write-lock is needed.
Above block-layer, both the opcodes REQ_OP_WRITE and
REQ_OP_ZONE_APPEND are available to be used by users. And I thought
write-lock is taken or not is a per-opcode thing and not per-user (FS,
MD/DM, user-space etc.), is not that correct? And MQ-deadline can
cater to both the opcodes, while other schedulers cannot serve
REQ_OP_WRITE well for zoned-device.

> In the context of your patch series, ELEVATOR_F_ZBD_SEQ_WRITE should be set only
> and only if the drive does not have native zone append support.

Sure I can keep it that way, once I get it right. If it is really not
required for native-append drive, it should not be here at the place
where I added.

> And even in that
> case, since for an emulated zone append the zone write lock is taken and
> released by the emulation driver itself, ELEVATOR_F_ZBD_SEQ_WRITE is required
> only if the user will also be issuing regular writes at high QD. And that is
> trivially controllable by the user by simply setting the drive elevator to
> mq-deadline. Conclusion: setting ELEVATOR_F_ZBD_SEQ_WRITE is not needed.

Are we saying applications should switch schedulers based on the write
QD (use any-scheduler for QD1 and mq-deadline for QD-N).
Even if it does that, it does not know what other applications would
be doing. That seems hard-to-get-right and possible only in a
tightly-controlled environment.



--
Joshi