Re: [RFC PATCH 1/1] lightnvm: add lzbd - a zoned block device target

From: Randy Dunlap
Date: Thu Apr 18 2019 - 17:57:11 EST


Hi,

On 4/18/19 5:01 AM, hans@xxxxxxxxxxxxx wrote:
> diff --git a/Documentation/lightnvm/lzbd.txt b/Documentation/lightnvm/lzbd.txt
> new file mode 100644
> index 000000000000..8bdbc01a25be
> --- /dev/null
> +++ b/Documentation/lightnvm/lzbd.txt
> @@ -0,0 +1,122 @@
> +lzbd: A Zoned Block Device LightNVM Target
> +==========================================
> +
> +The lzbd lightnvm target makes it possible to expose an Open-Channel 2.0 SSD
> +as one or more zoned block devices.
> +
> +Each lightnvm target is assigned a range of parallel units. Parallel units(PUs)
> +are not shared among targets avoiding I/O QoS disturbances between targets as

(prefer:) targets,

> +far as possible.
> +
> +For more information on lightnvm, see [1]

end with period above, as the 2 below are done.

> +For more information on Open-Channel 2.0, see [2].
> +For more information on zoned block devices see [3].
> +
> +lzbd is designed to act as a slim adaptor, making it possible to plug
> +OCSSD 2.0 SSDs into the zone block device ecosystem.
> +
> +lzbd manages zone to chunk mapping, read/write restrictions, wear leveling
> +and write errors.
> +
> +Zone geometry
> +-------------
> +
> +From a user perspective, lzbd targets form a number of sequential-write-required
> +(BLK_ZONE_TYPE_SEQWRITE_REQ) zones.
> +
> +Not all of the target's capacity is exposed to the user.
> +Some chunks are reserved for metadata and over-provisioning.
> +
> +The zones follow the same constraints as described in [3].
> +
> +All zones are of the same size (SZ).
> +
> +Simple example:
> +
> +Sector Zone type
> + _______________________
> +0 --> | Sequential write req. |
> + | |
> + |_______________________|
> +SZ --> | Sequential write req. |
> + | |
> + |_______________________|
> +SZ*2..--> | Sequential write req. |
> + | |
> +.......... .........................
> + |_______________________|
> +SZ*N-1 --> | Sequential write req. |
> + |_______________________|
> +
> +
> +SZ is configurable, but is restricted to a multiple of
> +(chunk size (CLBA) * Number of PUs).
> +
> +Zone to chunk mapping
> +---------------------
> +
> +Zones are spread across PUs to allow maximum write throughput through striping.
> +One or more chunks (CHK) per PU is assigned.
> +
> +Example:
> +
> +OCSSD 2.0 Geometry: 4 PUs, 16 chunks per PU.
> +Zones: 3
> +
> + Zone PU0 PU1 PU2 PU3
> +_______ _____ _____ _____ _____
> + |CHK 0|CHK 0|CHK A|CHK 0|
> + 0 |CHK 2|CHK 3|CHK 3|CHK 1|
> +_______ |_____|_____|_____|_____|
> + |CHK 3|CHK B|CHK 8|CHK A|
> + 1 |CHK 7|CHK F|CHK 2|CHK 3|
> +_______ |_____|_____|_____|_____|
> + |CHK 8|CHK 2|CHK 7|CHK 4|
> + 2 |CHK 1|CHK A|CHK 5|CHK 2|
> +_______ |_____|_____|_____|_____|
> +
> +Chunks are assigned to a zone when it is opened based on the chunk wear index.
> +
> +Note: The disk's Maximum Open Chunks (MAXOC) limit puts an upper bound on
> +maximum simultaneously open zones (unless MAXOC = 0).
> +
> +Meta data and over-provisioning
> +-------------------------------

My dictionary searches all use metadata as one word, not two.

> +
> +lzbd needs the following meta data to be persisted:
> +
> +* a zone-to chunk mapping (Z2C) table, size: 4 bytes * Number of chunks
> +* a superblock containing target configuration, guuid, on-disk format version,

what is guuid, please?

> + etc.
> +
> +Additionally, chunks need to be reserved for handling:
> +
> +* write errors
> +* chunks wearing out and going offline
> +* persisting data not aligned with the minimal write constraint
> +
> +The meta data is stored a separate set of chunks from the user data.
> +
> +Host memory requirements
> +------------------------
> +
> +The Z2C mapping table needs to be kept in host memory (see above), and:
> +
> +* in order to achieve maximum throughput and alignment requirements,
> + a small write buffer is needed
> + Size: Optimal Write Size (WS_OPT) * Maximum number of open zones.
> +
> +* to satisify OCSSD 2.0 read restrictions, a read buffer is needed.

satisfy

> + Size: Number of PUs * Cache Minimum Write Size Units (MW_CUNITS) *
> + Maximum number of open zones.
> +
> +If MW_CUNITS = 0, no read buffer is needed and data can be written without
> +any host copying/buffering (except for handling WS_OPT alignment).
> +
> +References
> +----------
> +
> +[1] Lightnvm website: http://lightnvm.io/
> +[2] OCSSD 2.0 Specification: http://lightnvm.io/docs/OCSSD-2_0-20180129.pdf
> +[3] ZBC / Zoned block device support: https://lwn.net/Articles/703871/
> +

thanks.
--
~Randy