Re: Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes]

From: Bart Van Assche
Date: Fri Sep 23 2022 - 12:21:20 EST


On 9/22/22 23:29, Matias Bjørling wrote:
With UFS, in the proposed copy I have (may been changed) - there's
the concept of gap zones, which is zones that cannot be accessed by
the host. The gap zones are essentially "LBA fillers", enabling the
next writeable zone to start at a X * pow2 size offset. My
understanding is that this specific approach was chosen to simplify
standardization in UFS and avoid updating T10's ZBC with zone
capacity support.

While UFS would technically expose non-power of 2 zone sizes, they're
also, due to the gap zones, could also be considered power of 2 zones
if one considers the seq. write zone + the gap zone as a single
unit.

When I think about having UFS support in the kernel, the SWR and the
gap zone could be represented as a single unit. For example:

UFS - Zone Report
Zone 0: SWR, LBA 0-11
Zone 1: Gap, LBA 12-15
Zone 2: SWR, LBA 16-27
Zone 3: Gap, LBA 28-31
...

Kernel representation - Zone Report (as supported today)
Zone 0: SWR, LBA 0-15, Zone Capacity 12
Zone 1: SWR, LBA 16-31, Zone Capacity 12
...

If doing it this way, it removes the need for filesystems,
device-mappers, user-space applications having to understand gap
zones, and allows UFS to work out of the box with no changes to the
rest of the zoned storage eco-system.

Has the above representation been considered?

Hi Matias,

What has been described above is the approach from the first version of the zoned storage for UFS (ZUFS) draft standard. Support for this approach is available in the upstream kernel. See also "[PATCH v2 0/9] Support zoned devices with gap zones", 2022-04-21 (https://lore.kernel.org/linux-scsi/20220421183023.3462291-1-bvanassche@xxxxxxx/).

Since F2FS extents must be split at gap zones, gap zones negatively affect sequential read and write performance. So we abandoned the gap zone approach. The current approach is as follows:
* The power-of-two restriction for the offset between zone starts has been removed. Gap zones are no longer required. Hence, we will need the patches that add support for zone sizes that are not a power of two.
* The Sequential Write Required (SWR) and Sequential Write Preferred (SWP) zone types are supported. The feedback we received from UFS vendors is that which zone type works best depends on their firmware and ASIC design.
* We need a queue depth larger than one (QD > 1) for writes to achieve the full sequential write bandwidth. We plan to support QD > 1 as follows:
- If writes have to be serialized, submit these to the same hardware
queue. According to the UFS host controller interface (UFSHCI)
standard, UFS host controllers are not allowed to reorder SCSI
commands that are submitted to the same hardware queue. A source of
command reordering that remains is the SCSI retry mechanism. Retries
happen e.g. after a command timeout.
- For SWP zones, require the UFS device firmware to use its garbage
collection mechanism to reorder data in the unlikely case that
out-of-order writes happened.
- For SWR zones, retry writes that failed because these were received
out-of-order by a UFS device. ZBC-1 requires compliant devices to
respond with ILLEGAL REQUEST / UNALIGNED WRITE COMMAND to out-of-
order writes.

We have considered the zone append approach but decided not to use it because if zone append commands get reordered the data ends up permanently out-of-order on the storage medium. This affects sequential read performance negatively.

Bart.