[RFC PATCH 1/1] lightnvm: add lzbd - a zoned block device target

From: hans
Date: Thu Apr 18 2019 - 08:02:22 EST

Next message: tip-bot for Arnd Bergmann: "[tip:locking/core] locking/lockdep: Avoid bogus Clang warning"
Previous message: hans: "[RFC PATCH 0/1] Introduce a new target: lzbd - LightNVM Zoned Block Device"
In reply to: hans: "[RFC PATCH 0/1] Introduce a new target: lzbd - LightNVM Zoned Block Device"
Next in thread: Randy Dunlap: "Re: [RFC PATCH 1/1] lightnvm: add lzbd - a zoned block device target"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>

Introduce a new target: lzbd - LightNVM Zoned Block Device

The new target makes it possible to expose an
Open-Channel 2.0 SSD as one or more zoned block devices.

See Documentation/lightnvm/lzbd.txt for more information.

Experimental in its present state of implementation.

Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>
---
Documentation/lightnvm/lzbd.txt | 122 +++++++++++
drivers/lightnvm/Kconfig | 11 +
drivers/lightnvm/Makefile | 3 +
drivers/lightnvm/lzbd-io.c | 342 +++++++++++++++++++++++++++++++
drivers/lightnvm/lzbd-target.c | 392 +++++++++++++++++++++++++++++++++++
drivers/lightnvm/lzbd-user.c | 310 ++++++++++++++++++++++++++++
drivers/lightnvm/lzbd-zone.c | 444 ++++++++++++++++++++++++++++++++++++++++
drivers/lightnvm/lzbd.h | 139 +++++++++++++
8 files changed, 1763 insertions(+)
create mode 100644 Documentation/lightnvm/lzbd.txt
create mode 100644 drivers/lightnvm/lzbd-io.c
create mode 100644 drivers/lightnvm/lzbd-target.c
create mode 100644 drivers/lightnvm/lzbd-user.c
create mode 100644 drivers/lightnvm/lzbd-zone.c
create mode 100644 drivers/lightnvm/lzbd.h

diff --git a/Documentation/lightnvm/lzbd.txt b/Documentation/lightnvm/lzbd.txt
new file mode 100644
index 000000000000..8bdbc01a25be
--- /dev/null
+++ b/Documentation/lightnvm/lzbd.txt
@@ -0,0 +1,122 @@
+lzbd: A Zoned Block Device LightNVM Target
+==========================================
+
+The lzbd lightnvm target makes it possible to expose an Open-Channel 2.0 SSD
+as one or more zoned block devices.
+
+Each lightnvm target is assigned a range of parallel units. Parallel units(PUs)
+are not shared among targets avoiding I/O QoS disturbances between targets as
+far as possible.
+
+For more information on lightnvm, see [1]
+For more information on Open-Channel 2.0, see [2].
+For more information on zoned block devices see [3].
+
+lzbd is designed to act as a slim adaptor, making it possible to plug
+OCSSD 2.0 SSDs into the zone block device ecosystem.
+
+lzbd manages zone to chunk mapping, read/write restrictions, wear leveling
+and write errors.
+
+Zone geometry
+-------------
+
+From a user perspective, lzbd targets form a number of sequential-write-required
+(BLK_ZONE_TYPE_SEQWRITE_REQ) zones.
+
+Not all of the target's capacity is exposed to the user.
+Some chunks are reserved for metadata and over-provisioning.
+
+The zones follow the same constraints as described in [3].
+
+All zones are of the same size (SZ).
+
+Simple example:
+
+Sector Zone type
+ _______________________
+0 --> | Sequential write req. |
+ | |
+ |_______________________|
+SZ --> | Sequential write req. |
+ | |
+ |_______________________|
+SZ*2..--> | Sequential write req. |
+ | |
+.......... .........................
+ |_______________________|
+SZ*N-1 --> | Sequential write req. |
+ |_______________________|
+
+
+SZ is configurable, but is restricted to a multiple of
+(chunk size (CLBA) * Number of PUs).
+
+Zone to chunk mapping
+---------------------
+
+Zones are spread across PUs to allow maximum write throughput through striping.
+One or more chunks (CHK) per PU is assigned.
+
+Example:
+
+OCSSD 2.0 Geometry: 4 PUs, 16 chunks per PU.
+Zones: 3
+
+ Zone PU0 PU1 PU2 PU3
+_______ _____ _____ _____ _____
+ |CHK 0|CHK 0|CHK A|CHK 0|
+ 0 |CHK 2|CHK 3|CHK 3|CHK 1|
+_______ |_____|_____|_____|_____|
+ |CHK 3|CHK B|CHK 8|CHK A|
+ 1 |CHK 7|CHK F|CHK 2|CHK 3|
+_______ |_____|_____|_____|_____|
+ |CHK 8|CHK 2|CHK 7|CHK 4|
+ 2 |CHK 1|CHK A|CHK 5|CHK 2|
+_______ |_____|_____|_____|_____|
+
+Chunks are assigned to a zone when it is opened based on the chunk wear index.
+
+Note: The disk's Maximum Open Chunks (MAXOC) limit puts an upper bound on
+maximum simultaneously open zones (unless MAXOC = 0).
+
+Meta data and over-provisioning
+-------------------------------
+
+lzbd needs the following meta data to be persisted:
+
+* a zone-to chunk mapping (Z2C) table, size: 4 bytes * Number of chunks
+* a superblock containing target configuration, guuid, on-disk format version,
+ etc.
+
+Additionally, chunks need to be reserved for handling:
+
+* write errors
+* chunks wearing out and going offline
+* persisting data not aligned with the minimal write constraint
+
+The meta data is stored a separate set of chunks from the user data.
+
+Host memory requirements
+------------------------
+
+The Z2C mapping table needs to be kept in host memory (see above), and:
+
+* in order to achieve maximum throughput and alignment requirements,
+ a small write buffer is needed
+ Size: Optimal Write Size (WS_OPT) * Maximum number of open zones.
+
+* to satisify OCSSD 2.0 read restrictions, a read buffer is needed.
+ Size: Number of PUs * Cache Minimum Write Size Units (MW_CUNITS) *
+ Maximum number of open zones.
+
+If MW_CUNITS = 0, no read buffer is needed and data can be written without
+any host copying/buffering (except for handling WS_OPT alignment).
+
+References
+----------
+
+[1] Lightnvm website: http://lightnvm.io/
+[2] OCSSD 2.0 Specification: http://lightnvm.io/docs/OCSSD-2_0-20180129.pdf
+[3] ZBC / Zoned block device support: https://lwn.net/Articles/703871/
+
diff --git a/drivers/lightnvm/Kconfig b/drivers/lightnvm/Kconfig
index a872cd720967..98882874bda6 100644
--- a/drivers/lightnvm/Kconfig
+++ b/drivers/lightnvm/Kconfig
@@ -16,6 +16,17 @@ menuconfig NVM

if NVM

+config NVM_LZBD
+ tristate "Zoned Block Device Open-Channel SSD target"
+ depends on BLK_DEV_ZONED
+ help
+ Allows an open-channel SSD to be exposed as a zoned block device to the
+ host.
+
+ Highly EXPERIMENTAL for now.
+
+ Only say Y if you want to play with it.
+
config NVM_PBLK
tristate "Physical Block Device Open-Channel SSD target"
help
diff --git a/drivers/lightnvm/Makefile b/drivers/lightnvm/Makefile
index 97d9d7c71550..f9eea8b23b33 100644
--- a/drivers/lightnvm/Makefile
+++ b/drivers/lightnvm/Makefile
@@ -9,3 +9,6 @@ pblk-y := pblk-init.o pblk-core.o pblk-rb.o \
pblk-write.o pblk-cache.o pblk-read.o \
pblk-gc.o pblk-recovery.o pblk-map.o \
pblk-rl.o pblk-sysfs.o
+
+obj-$(CONFIG_NVM_LZBD) += lzbd.o
+lzbd-y := lzbd-target.o lzbd-user.o lzbd-io.o lzbd-zone.o
diff --git a/drivers/lightnvm/lzbd-io.c b/drivers/lightnvm/lzbd-io.c
new file mode 100644
index 000000000000..b210ab33fdd3
--- /dev/null
+++ b/drivers/lightnvm/lzbd-io.c
@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * Zoned block device lightnvm target
+ * Copyright (C) 2019 CNEX Labs
+ *
+ * Disk I/O
+ */
+
+#include "lzbd.h"
+
+static inline void lzbd_chunk_log(char *message, int err,
+ struct lzbd_chunk *lzbd_chunk)
+{
+
+ /* TODO: create trace points in stead */
+ pr_err("lzbd: %s: err: %d grp: %d pu: %d chk: %d slba: %llu state: %d wp: %llu\n",
+ message,
+ err,
+ lzbd_chunk->ppa.m.grp,
+ lzbd_chunk->ppa.m.pu,
+ lzbd_chunk->ppa.m.chk,
+ lzbd_chunk->meta->slba,
+ lzbd_chunk->meta->state,
+ lzbd_chunk->meta->wp);
+}
+
+int lzbd_reset_chunk(struct lzbd *lzbd, struct lzbd_chunk *chunk)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_rq rqd = {NULL};
+ int ret;
+
+ if ((chunk->meta->state & (NVM_CHK_ST_FREE | NVM_CHK_ST_OFFLINE))) {
+ pr_err("lzbd: reset of chunk in illegal state: %d\n",
+ chunk->meta->state);
+ return -EINVAL;
+ }
+
+ rqd.opcode = NVM_OP_ERASE;
+ rqd.ppa_addr = chunk->ppa;
+ rqd.nr_ppas = 1;
+ rqd.is_seq = 1;
+
+ ret = nvm_submit_io_sync(dev, &rqd);
+
+ /* For now, set the chunk offline if the request fails
+ * TODO: Pass a buffer in the request so we get a full
+ * meta update from the device
+ */
+
+ if (!ret) {
+ if (rqd.error) {
+ if ((rqd.error & 0xfff) == 0x2c0) {
+ lzbd_chunk_log("chunk went offline", 0, chunk);
+ chunk->meta->state = NVM_CHK_ST_OFFLINE;
+ } else {
+ if ((rqd.error & 0xfff) == 0x2c1) {
+ lzbd_chunk_log("invalid reset",
+ -EINVAL, chunk);
+ } else {
+ lzbd_chunk_log("unknown error",
+ -EINVAL, chunk);
+ }
+ return -EINVAL;
+ }
+ } else {
+ chunk->meta->state = NVM_CHK_ST_FREE;
+ chunk->meta->wp = 0;
+ }
+ }
+
+ return ret;
+}
+
+/* Prepare a write request to a chunk. If the function call succeeds
+ * the call must be paired with a lzbd_free_wr_rq
+ */
+static int lzbd_init_wr_rq(struct lzbd *lzbd, struct lzbd_chunk *chunk,
+ struct bio *bio, struct nvm_rq *rq)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct ppa_addr ppa;
+ struct ppa_addr *ppa_list;
+ int metadata_sz = geo->sos * NVM_MAX_VLBA;
+ int nr_ppas = geo->ws_opt;
+ int i;
+
+ memset(rq, 0, sizeof(struct nvm_rq));
+
+ rq->bio = bio;
+ rq->opcode = NVM_OP_PWRITE;
+ rq->nr_ppas = nr_ppas;
+ rq->is_seq = 1;
+ rq->private = &chunk->wr_ctx;
+
+ /* Do we respect the write size restrictions? */
+ if (nr_ppas > geo->ws_opt || (nr_ppas % geo->ws_min)) {
+ pr_err("lzbd: write size violation size: %d\n", nr_ppas);
+ return -EINVAL;
+ }
+
+ /* Is the chunk in the right state? */
+ if (!(chunk->meta->state & (NVM_CHK_ST_FREE | NVM_CHK_ST_OPEN))) {
+ pr_err("lzbd: write to chunk in wrong state: %d\n",
+ chunk->meta->state);
+ return -EINVAL;
+ }
+
+ /* Do we have room for the write? */
+ if ((chunk->meta->wp + nr_ppas) > geo->clba) {
+ pr_err("lzbd: cant fit write into chunk size %d\n", nr_ppas);
+ return -EINVAL;
+ }
+
+ rq->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
+ &rq->dma_meta_list);
+ if (!rq->meta_list)
+ return -ENOMEM;
+
+ /* We don't care about metadata. yet. */
+ memset(rq->meta_list, 42, metadata_sz);
+
+ if (nr_ppas > 1) {
+ rq->ppa_list = rq->meta_list + metadata_sz;
+ rq->dma_ppa_list = rq->dma_meta_list + metadata_sz;
+ }
+
+ //pr_err("lzbd: writing %d sectors\n", nr_ppas);
+
+ ppa.ppa = chunk->ppa.ppa;
+
+ mutex_lock(&chunk->wr_ctx.wr_lock);
+
+ ppa.m.sec = chunk->meta->wp;
+
+ ppa_list = nvm_rq_to_ppa_list(rq);
+ for (i = 0; i < nr_ppas; i++) {
+ ppa_list[i].ppa = ppa.ppa;
+ ppa.m.sec++;
+ }
+
+ return 0;
+}
+
+static void lzbd_free_wr_rq(struct lzbd *lzbd, struct nvm_rq *rq)
+{
+ struct lzbd_wr_ctx *wr_ctx = rq->private;
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct lzbd_chunk *chunk;
+
+ chunk = container_of(wr_ctx, struct lzbd_chunk, wr_ctx);
+
+ mutex_unlock(&chunk->wr_ctx.wr_lock);
+ nvm_dev_dma_free(dev->parent, rq->meta_list, rq->dma_meta_list);
+}
+
+static inline void lzbd_wr_rq_post(struct nvm_rq *rq)
+{
+ struct lzbd_wr_ctx *wr_ctx = rq->private;
+ struct lzbd *lzbd = wr_ctx->lzbd;
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct lzbd_chunk *chunk;
+
+ chunk = container_of(wr_ctx, struct lzbd_chunk, wr_ctx);
+
+ if (!rq->error) {
+ if (chunk->meta->wp == 0)
+ chunk->meta->state = NVM_CHK_ST_OPEN;
+
+ chunk->meta->wp += rq->nr_ppas;
+ if (chunk->meta->wp == geo->clba)
+ chunk->meta->state = NVM_CHK_ST_CLOSED;
+ }
+}
+
+int lzbd_write_to_chunk_sync(struct lzbd *lzbd, struct lzbd_chunk *chunk,
+ struct bio *bio)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_rq rq;
+ int ret;
+
+ ret = lzbd_init_wr_rq(lzbd, chunk, bio, &rq);
+ if (ret)
+ return ret;
+
+ ret = nvm_submit_io_sync(dev, &rq);
+ if (ret) {
+ ret = rq.error;
+ pr_err("lzbd: sync write request submit failed: %d\n", ret);
+ } else {
+ lzbd_wr_rq_post(&rq);
+ }
+
+ lzbd_free_wr_rq(lzbd, &rq);
+
+ return ret;
+}
+
+static void lzbd_read_endio(struct nvm_rq *rq)
+{
+ struct lzbd_rd_ctx *rd_ctx = container_of(rq, struct lzbd_rd_ctx, rqd);
+ struct lzbd *lzbd = rd_ctx->lzbd;
+ struct lzbd_user_read *read = rd_ctx->read;
+ struct nvm_tgt_dev *dev = lzbd->dev;
+
+ if (unlikely(rq->error))
+ read->error = true;
+
+ if (rq->meta_list)
+ nvm_dev_dma_free(dev->parent, rq->meta_list, rq->dma_meta_list);
+
+ kref_put(&read->ref, lzbd_user_read_put);
+ kfree(rd_ctx);
+}
+
+static int lzbd_read_from_chunk_async(struct lzbd *lzbd,
+ struct lzbd_chunk *chunk,
+ struct bio *bio,
+ struct lzbd_user_read *user_read,
+ int start)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct lzbd_rd_ctx *rd_ctx;
+ struct nvm_rq *rq;
+ struct ppa_addr ppa;
+ struct ppa_addr *ppa_list;
+ int metadata_sz = geo->sos * NVM_MAX_VLBA;
+ int nr_ppas = lzbd_get_bio_len(bio);
+ int ret;
+ int i;
+
+ /* Do we respect the read size restrictions? */
+ if (nr_ppas >= NVM_MAX_VLBA) {
+ pr_err("lzbd: read size violation size: %d\n", nr_ppas);
+ return -EINVAL;
+ }
+
+ /* Is the chunk in the right state? */
+ if (!(chunk->meta->state & (NVM_CHK_ST_OPEN | NVM_CHK_ST_CLOSED))) {
+ pr_err("lzbd: read from chunk in wrong state: %d\n",
+ chunk->meta->state);
+ return -EINVAL;
+ }
+
+ /*Are we reading within bounds? */
+ if ((start + nr_ppas) > geo->clba) {
+ pr_err("lzbd: read past the chunk size %d start: %d\n",
+ nr_ppas, start);
+ return -EINVAL;
+ }
+
+ rd_ctx = kzalloc(sizeof(struct lzbd_rd_ctx), GFP_KERNEL);
+ if (!rd_ctx)
+ return -ENOMEM;
+
+ rd_ctx->read = user_read;
+ rd_ctx->lzbd = lzbd;
+
+ rq = &rd_ctx->rqd;
+ rq->bio = bio;
+ rq->opcode = NVM_OP_PREAD;
+ rq->nr_ppas = nr_ppas;
+ rq->end_io = lzbd_read_endio;
+ rq->private = lzbd;
+ rq->meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
+ &rq->dma_meta_list);
+ if (!rq->meta_list) {
+ kfree(rd_ctx);
+ return -ENOMEM;
+ }
+
+ if (nr_ppas > 1) {
+ rq->ppa_list = rq->meta_list + metadata_sz;
+ rq->dma_ppa_list = rq->dma_meta_list + metadata_sz;
+ }
+
+ ppa.ppa = chunk->ppa.ppa;
+ ppa.m.sec = start;
+
+ ppa_list = nvm_rq_to_ppa_list(rq);
+ for (i = 0; i < nr_ppas; i++) {
+ ppa_list[i].ppa = ppa.ppa;
+ ppa.m.sec++;
+ }
+
+ ret = nvm_submit_io(dev, rq);
+
+ if (ret) {
+ pr_err("lzbd: read request submit failed: %d\n", ret);
+ nvm_dev_dma_free(dev->parent, rq->meta_list, rq->dma_meta_list);
+ kfree(rd_ctx);
+ }
+
+ return ret;
+}
+
+int lzbd_write_to_chunk_user(struct lzbd *lzbd, struct lzbd_chunk *chunk,
+ struct bio *user_bio)
+{
+ struct bio *write_bio;
+ int ret = 0;
+
+ write_bio = bio_clone_fast(user_bio, GFP_KERNEL, &lzbd_bio_set);
+ if (!write_bio)
+ return -ENOMEM;
+
+ ret = lzbd_write_to_chunk_sync(lzbd, chunk, write_bio);
+ if (ret) {
+ ret = -EIO;
+ bio_io_error(user_bio);
+ } else {
+ ret = 0;
+ bio_endio(user_bio);
+ }
+
+ return ret;
+}
+
+int lzbd_read_from_chunk_user(struct lzbd *lzbd, struct lzbd_chunk *chunk,
+ struct bio *bio, struct lzbd_user_read *user_read,
+ int start)
+{
+ struct bio *read_bio;
+ int ret = 0;
+
+ read_bio = bio_clone_fast(bio, GFP_KERNEL, &lzbd_bio_set);
+ if (!read_bio) {
+ pr_err("lzbd: bio clone failed!\n");
+ return -ENOMEM;
+ }
+
+ ret = lzbd_read_from_chunk_async(lzbd, chunk,
+ read_bio, user_read, start);
+
+ return ret;
+}
+
diff --git a/drivers/lightnvm/lzbd-target.c b/drivers/lightnvm/lzbd-target.c
new file mode 100644
index 000000000000..04dd22873eeb
--- /dev/null
+++ b/drivers/lightnvm/lzbd-target.c
@@ -0,0 +1,392 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * Zoned block device lightnvm target
+ * Copyright (C) 2019 CNEX Labs
+ *
+ * Target handling: module boilerplate, init and remove
+ */
+
+#include <linux/module.h>
+
+#include "lzbd.h"
+
+struct bio_set lzbd_bio_set;
+
+static sector_t lzbd_capacity(void *private)
+{
+ struct lzbd *lzbd = private;
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+
+ return dl->capacity;
+}
+
+static void lzbd_free_chunks(struct lzbd *lzbd)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct lzbd_chunks *chunks = &lzbd->chunks;
+ int parallel_units = geo->all_luns;
+ int i;
+
+ for (i = 0; i < parallel_units; i++) {
+ struct lzbd_pu *pu = &chunks->pus[i];
+ struct list_head *pos, *n;
+ struct lzbd_chunk *chunk;
+
+ mutex_destroy(&pu->lock);
+
+ list_for_each_safe(pos, n, &pu->chk_list) {
+ chunk = list_entry(pos, struct lzbd_chunk, list);
+
+ list_del(pos);
+ mutex_destroy(&chunk->wr_ctx.wr_lock);
+ kfree(chunk);
+ }
+ }
+
+ kfree(chunks->pus);
+ vfree(chunks->meta);
+}
+
+/* Add chunk to chunklist in falling wi order */
+void lzbd_add_chunk(struct lzbd_chunk *chunk,
+ struct list_head *head)
+{
+ struct lzbd_chunk *c = NULL;
+
+ list_for_each_entry(c, head, list) {
+ if (chunk->meta->wi < c->meta->wi)
+ break;
+ }
+
+ list_add_tail(&chunk->list, &c->list);
+}
+
+
+static int lzbd_init_chunks(struct lzbd *lzbd)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct nvm_chk_meta *meta;
+ struct lzbd_chunks *chunks = &lzbd->chunks;
+ int parallel_units = geo->all_luns;
+ struct ppa_addr ppa;
+ int ret;
+ int chk;
+ int i;
+
+ chunks->pus = kcalloc(parallel_units, sizeof(struct lzbd_pu),
+ GFP_KERNEL);
+ if (!chunks->pus)
+ return -ENOMEM;
+
+ meta = vzalloc(geo->all_chunks * sizeof(*meta));
+ if (!meta) {
+ kfree(chunks->pus);
+ return -ENOMEM;
+ }
+
+ chunks->meta = meta;
+
+ for (i = 0; i < parallel_units; i++) {
+ struct lzbd_pu *lzbd_pu = &chunks->pus[i];
+
+ INIT_LIST_HEAD(&lzbd_pu->chk_list);
+ mutex_init(&lzbd_pu->lock);
+ }
+
+ ppa.ppa = 0; /* get all chunks */
+ ret = nvm_get_chunk_meta(dev, ppa, geo->all_chunks, meta);
+ if (ret) {
+ lzbd_free_chunks(lzbd);
+ return -EIO;
+ }
+
+ for (chk = 0; chk < geo->num_chk; chk++) {
+ for (i = 0; i < parallel_units; i++) {
+ struct lzbd_pu *lzbd_pu = &chunks->pus[i];
+ struct nvm_chk_meta *chk_meta;
+ int grp = i / geo->num_lun;
+ int pu = i % geo->num_lun;
+ int offset = 0;
+
+ offset += grp * geo->num_lun * geo->num_chk;
+ offset += pu * geo->num_chk;
+ offset += chk;
+
+ chk_meta = &meta[offset];
+
+ if (!(chk_meta->state & NVM_CHK_ST_OFFLINE)) {
+ struct lzbd_chunk *chunk;
+
+ chunk = kzalloc(sizeof(*chunk), GFP_KERNEL);
+ if (!chunk) {
+ lzbd_free_chunks(lzbd);
+ return -ENOMEM;
+ }
+
+ INIT_LIST_HEAD(&chunk->list);
+ chunk->meta = chk_meta;
+ chunk->ppa.m.grp = grp;
+ chunk->ppa.m.pu = pu;
+ chunk->ppa.m.chk = chk;
+ chunk->pu = i;
+
+ lzbd_add_chunk(chunk, &lzbd_pu->chk_list);
+
+ mutex_init(&chunk->wr_ctx.wr_lock);
+ chunk->wr_ctx.lzbd = lzbd;
+ } else {
+ lzbd_pu->offline_chks++;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static struct lzbd_zone *lzbd_init_zones(struct lzbd *lzbd)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ int i;
+ struct lzbd_zone *zones;
+ u64 zone_offset = 0;
+
+ zones = kmalloc_array(dl->zones, sizeof(*zones), GFP_KERNEL);
+ if (!zones)
+ return NULL;
+
+ /* Sequential zones */
+ for (i = 0; i < dl->zones; i++, zone_offset += dl->zone_size) {
+ struct lzbd_zone *zone = &zones[i];
+ struct blk_zone *bz = &zone->blk_zone;
+
+ bz->start = zone_offset;
+ bz->len = dl->zone_size;
+ bz->wp = zone_offset + dl->zone_size;
+ bz->type = BLK_ZONE_TYPE_SEQWRITE_REQ;
+ bz->cond = BLK_ZONE_COND_FULL;
+
+ bz->non_seq = 0;
+ bz->reset = 1;
+
+ /* zero-out reserved bytes to be forward-compatible */
+ memset(bz->reserved, 0, sizeof(bz->reserved));
+
+ zones[i].chunks = NULL;
+ mutex_init(&zone->lock);
+
+ zone->wr_align.buffer = NULL;
+ mutex_init(&zone->wr_align.lock);
+ }
+
+ return zones;
+}
+
+
+static void lzbd_config_disk_queue(struct lzbd *lzbd)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct gendisk *disk = lzbd->disk;
+ struct nvm_geo *geo = &dev->geo;
+ struct request_queue *bqueue = dev->q;
+ struct request_queue *dqueue = disk->queue;
+
+ blk_queue_logical_block_size(dqueue, queue_physical_block_size(bqueue));
+ blk_queue_max_hw_sectors(dqueue, queue_max_hw_sectors(bqueue));
+
+ blk_queue_write_cache(dqueue, true, false);
+
+ dqueue->limits.discard_granularity = geo->clba * geo->csecs;
+ dqueue->limits.discard_alignment = 0;
+ blk_queue_max_discard_sectors(dqueue, UINT_MAX >> 9);
+ blk_queue_flag_set(QUEUE_FLAG_DISCARD, dqueue);
+
+ dqueue->limits.zoned = BLK_ZONED_HM;
+ dqueue->nr_zones = dl->zones;
+ dqueue->limits.chunk_sectors = dl->zone_size;
+}
+
+
+static int lzbd_dev_is_supported(struct nvm_tgt_dev *dev)
+{
+ struct nvm_geo *geo = &dev->geo;
+
+ if (geo->major_ver_id != 2) {
+ pr_err("lzbd only supports Open Channel 2.x devices\n");
+ return 0;
+ }
+
+ if (geo->csecs != LZBD_SECTOR_SIZE) {
+ pr_err("lzbd: unsupported block size %d", geo->csecs);
+ return 0;
+ }
+
+ /* We will need to check(some of) these parameters later on,
+ * but for now, just print them. TODO: check cunit, maxoc
+ */
+ pr_info("lzbd: ws_min:%d ws_opt:%d cunits:%d maxoc:%d maxocpu:%d\n",
+ geo->ws_min, geo->ws_opt, geo->mw_cunits,
+ geo->maxoc, geo->maxocpu);
+
+ return 1;
+}
+
+
+static const struct block_device_operations lzbd_fops = {
+ .report_zones = lzbd_report_zones,
+ .owner = THIS_MODULE,
+};
+
+static void lzbd_dump_geo(struct nvm_tgt_dev *dev)
+{
+ struct nvm_geo *geo = &dev->geo;
+
+ pr_info("lzbd: target geo: num_grp: %d num_pu: %d num_chk: %d ws_opt: %d\n",
+ geo->num_ch, geo->all_luns, geo->num_chk, geo->ws_opt);
+}
+
+static void lzbd_create_layout(struct lzbd *lzbd)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ int user_chunks;
+
+ /* Default to 20% over-provisioning if not specified
+ * (better safe than sorry)
+ */
+ if (geo->op == NVM_TARGET_DEFAULT_OP)
+ dl->op = 20;
+ else
+ dl->op = geo->op;
+
+ dl->meta_chunks = 4;
+ dl->zone_chunks = geo->all_luns;
+ dl->zone_size = (geo->clba * dl->zone_chunks) << 3;
+
+ user_chunks = geo->all_chunks * (100 - dl->op);
+ sector_div(user_chunks, 100);
+
+ dl->zones = user_chunks / dl->zone_chunks;
+ dl->capacity = dl->zones * dl->zone_size;
+}
+
+static void lzbd_dump_layout(struct lzbd *lzbd)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+
+ pr_info("lzbd: layout: op: %d zones: %d per zone chks: %d secs: %llu\n",
+ dl->op, dl->zones, dl->zone_chunks,
+ (unsigned long long)dl->zone_size);
+}
+
+static void *lzbd_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
+ int flags)
+{
+ struct lzbd *lzbd;
+
+ lzbd_dump_geo(dev);
+
+ if (!lzbd_dev_is_supported(dev))
+ return ERR_PTR(-EINVAL);
+
+
+ if (!(flags & NVM_TARGET_FACTORY)) {
+ pr_err("lzbd: metadata not persisted, only factory init supported\n");
+ return ERR_PTR(-EINVAL);
+ }
+
+ lzbd = kzalloc(sizeof(struct lzbd), GFP_KERNEL);
+ if (!lzbd)
+ return ERR_PTR(-ENOMEM);
+
+ lzbd->dev = dev;
+ lzbd->disk = tdisk;
+
+ lzbd_create_layout(lzbd);
+ lzbd_dump_layout(lzbd);
+
+ lzbd->zones = lzbd_init_zones(lzbd);
+
+ if (!lzbd->zones)
+ goto err_free_lzbd;
+
+ if (lzbd_init_chunks(lzbd))
+ goto err_free_zones;
+ lzbd_config_disk_queue(lzbd);
+
+ /* Override the fops to enable zone reporting support */
+ lzbd->disk->fops = &lzbd_fops;
+
+ return lzbd;
+
+err_free_zones:
+ kfree(lzbd->zones);
+err_free_lzbd:
+ kfree(lzbd);
+
+ return ERR_PTR(-ENOMEM);
+}
+
+static void lzbd_exit(void *private, bool graceful)
+{
+ struct lzbd *lzbd = private;
+
+ lzbd_free_chunks(lzbd);
+ kfree(lzbd->zones);
+ kfree(lzbd);
+}
+
+
+static int lzbd_sysfs_init(struct gendisk *tdisk)
+{
+ /* Crickets */
+ return 0;
+}
+
+static void lzbd_sysfs_exit(struct gendisk *tdisk)
+{
+ /* Tumbleweed */
+}
+
+static struct nvm_tgt_type tt_lzbd = {
+ .name = "lzbd",
+ .version = {0, 0, 1},
+
+ .init = lzbd_init,
+ .exit = lzbd_exit,
+
+ .capacity = lzbd_capacity,
+ .make_rq = lzbd_make_rq,
+
+ .sysfs_init = lzbd_sysfs_init,
+ .sysfs_exit = lzbd_sysfs_exit,
+
+ .owner = THIS_MODULE,
+};
+
+static int __init lzbd_module_init(void)
+{
+ int ret;
+
+ ret = bioset_init(&lzbd_bio_set, BIO_POOL_SIZE, 0, 0);
+ if (ret)
+ return ret;
+
+ return nvm_register_tgt_type(&tt_lzbd);
+}
+
+static void lzbd_module_exit(void)
+{
+ bioset_exit(&lzbd_bio_set);
+ nvm_unregister_tgt_type(&tt_lzbd);
+}
+
+module_init(lzbd_module_init);
+module_exit(lzbd_module_exit);
+MODULE_AUTHOR("Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Zoned Block-Device for Open-Channel SSDs");
diff --git a/drivers/lightnvm/lzbd-user.c b/drivers/lightnvm/lzbd-user.c
new file mode 100644
index 000000000000..e38ec763941e
--- /dev/null
+++ b/drivers/lightnvm/lzbd-user.c
@@ -0,0 +1,310 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * Zoned block device lightnvm target
+ * Copyright (C) 2019 CNEX Labs
+ *
+ * User interfacing code: read/write/reset requests
+ */
+
+#include "lzbd.h"
+
+static void lzbd_fail_bio(struct bio *bio, char *op)
+{
+ pr_err("lzbd: failing %s. start lba: %lu length: %lu\n", op,
+ lzbd_get_bio_lba(bio), lzbd_get_bio_len(bio));
+
+ bio_io_error(bio);
+}
+
+static struct lzbd_zone *lzbd_get_zone(struct lzbd *lzbd, sector_t sector)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ struct lzbd_zone *zone;
+ struct blk_zone *bz;
+
+ sector_div(sector, dl->zone_size);
+
+ if (sector >= dl->zones)
+ return NULL;
+
+ zone = &lzbd->zones[sector];
+ bz = &zone->blk_zone;
+
+ return zone;
+}
+
+static int lzbd_write_rq(struct lzbd *lzbd, struct lzbd_zone *zone,
+ struct bio *bio)
+{
+ sector_t sector = bio->bi_iter.bi_sector;
+ sector_t nr_secs = lzbd_get_bio_len(bio);
+ struct blk_zone *bz;
+ int left;
+
+ mutex_lock(&zone->lock);
+
+ bz = &zone->blk_zone;
+
+ if (bz->cond == BLK_ZONE_COND_OFFLINE) {
+ mutex_unlock(&zone->lock);
+ return -EIO;
+ }
+
+ if (bz->cond == BLK_ZONE_COND_EMPTY)
+ bz->cond = BLK_ZONE_COND_IMP_OPEN;
+
+ if (sector != bz->wp) {
+ if (sector == bz->start) {
+ if (lzbd_zone_reset(lzbd, zone)) {
+ pr_err("lzbd: zone reset failed");
+ bz->cond = BLK_ZONE_COND_OFFLINE;
+ mutex_unlock(&zone->lock);
+ return -EIO;
+ }
+ bz->cond = BLK_ZONE_COND_IMP_OPEN;
+ bz->wp = bz->start;
+ } else {
+ pr_err("lzbd: write pointer error");
+ mutex_unlock(&zone->lock);
+ return -EIO;
+ }
+ }
+
+ left = lzbd_zone_write(lzbd, zone, bio);
+
+ bz->wp += (nr_secs - left) << 3;
+ if (bz->wp == (bz->start + bz->len)) {
+ lzbd_zone_free_wr_buffer(zone);
+ bz->cond = BLK_ZONE_COND_FULL;
+ }
+
+ mutex_unlock(&zone->lock);
+
+ if (left > 0) {
+ pr_err("lzbd: write did not complete");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int lzbd_read_rq(struct lzbd *lzbd, struct lzbd_zone *zone,
+ struct bio *bio)
+{
+ struct blk_zone *bz;
+ sector_t read_end, data_end;
+ sector_t data_start = bio->bi_iter.bi_sector;
+ int ret;
+
+ if (!zone) {
+ lzbd_fail_bio(bio, "lzbd: no zone mapped to read sector");
+ return -EIO;
+ }
+
+ bz = &zone->blk_zone;
+
+ if (!zone->chunks || bz->cond == BLK_ZONE_COND_OFFLINE) {
+ /* No valid data in this zone */
+ zero_fill_bio(bio);
+ bio_endio(bio);
+ return 0;
+ }
+
+ if (data_start >= bz->wp) {
+ zero_fill_bio(bio);
+ bio_endio(bio);
+ return 0;
+ }
+
+ read_end = bio_end_sector(bio);
+ data_end = min_t(sector_t, bz->wp, read_end);
+
+ if (read_end > data_end) {
+ sector_t split_sz = data_end - data_start;
+ struct bio *split;
+
+ if (data_end <= data_start) {
+ lzbd_fail_bio(bio, "internal error(read)");
+ return -EIO;
+ }
+
+ split = bio_split(bio, split_sz,
+ GFP_KERNEL, &lzbd_bio_set);
+
+ ret = lzbd_zone_read(lzbd, zone, split);
+ if (ret) {
+ lzbd_fail_bio(bio, "split read");
+ return -EIO;
+ }
+
+ zero_fill_bio(bio);
+ bio_endio(bio);
+
+ } else {
+ lzbd_zone_read(lzbd, zone, bio);
+ }
+
+ return 0;
+}
+
+static void lzbd_zone_reset_rq(struct lzbd *lzbd, struct request_queue *q,
+ struct bio *bio)
+{
+ sector_t sector = bio->bi_iter.bi_sector;
+ struct lzbd_zone *zone;
+
+ zone = lzbd_get_zone(lzbd, sector);
+
+ if (zone) {
+ struct blk_zone *bz = &zone->blk_zone;
+ int ret;
+
+ mutex_lock(&zone->lock);
+
+ ret = lzbd_zone_reset(lzbd, zone);
+ if (ret) {
+ bz->cond = BLK_ZONE_COND_OFFLINE;
+ lzbd_fail_bio(bio, "zone reset");
+ mutex_unlock(&zone->lock);
+ return;
+ }
+
+ bz->cond = BLK_ZONE_COND_EMPTY;
+ bz->wp = bz->start;
+
+ mutex_unlock(&zone->lock);
+
+ bio_endio(bio);
+ } else {
+ bio_io_error(bio);
+ }
+}
+
+static void lzbd_discard_rq(struct lzbd *lzbd, struct request_queue *q,
+ struct bio *bio)
+{
+ /* TODO: Implement discard */
+ bio_endio(bio);
+}
+
+static struct bio *lzbd_zplit(struct lzbd *lzbd, struct bio *bio,
+ struct lzbd_zone **first_zone)
+{
+ sector_t bio_start = bio->bi_iter.bi_sector;
+ sector_t bio_end, zone_end;
+ struct lzbd_zone *zone;
+ struct blk_zone *bz;
+ struct bio *zone_bio;
+
+ zone = lzbd_get_zone(lzbd, bio_start);
+ if (!zone)
+ return NULL;
+
+ bio_end = bio_end_sector(bio);
+ bz = &zone->blk_zone;
+ zone_end = bz->start + bz->len;
+
+ if (bio_end > zone_end) {
+ zone_bio = bio_split(bio, zone_end - bio_start,
+ GFP_KERNEL, &lzbd_bio_set);
+ } else {
+ zone_bio = bio;
+ }
+
+ *first_zone = zone;
+ return zone_bio;
+}
+
+blk_qc_t lzbd_make_rq(struct request_queue *q, struct bio *bio)
+{
+ struct lzbd *lzbd = q->queuedata;
+
+ if (bio->bi_opf & REQ_PREFLUSH) {
+ /* TODO: Implement syncs */
+ pr_err("lzbd: ignoring sync!\n");
+ }
+
+ if (bio_op(bio) == REQ_OP_READ || bio_op(bio) == REQ_OP_WRITE) {
+ struct bio *zplit;
+ struct lzbd_zone *zone;
+
+ if (!lzbd_get_bio_len(bio)) {
+ bio_endio(bio);
+ return BLK_QC_T_NONE;
+ }
+
+ do {
+ zplit = lzbd_zplit(lzbd, bio, &zone);
+ if (!zplit || !zone) {
+ lzbd_fail_bio(bio, "zone split");
+ return BLK_QC_T_NONE;
+ }
+
+ if (op_is_write(bio_op(bio))) {
+ if (lzbd_write_rq(lzbd, zone, zplit)) {
+ lzbd_fail_bio(zplit, "write");
+ if (zplit != bio)
+ lzbd_fail_bio(bio,
+ "write");
+
+ return BLK_QC_T_NONE;
+ }
+ } else {
+ if (lzbd_read_rq(lzbd, zone, zplit)) {
+ lzbd_fail_bio(zplit, "read");
+ if (zplit != bio)
+ lzbd_fail_bio(bio,
+ "read");
+ return BLK_QC_T_NONE;
+ }
+ }
+ } while (bio != zplit);
+
+ return BLK_QC_T_NONE;
+ }
+
+ switch (bio_op(bio)) {
+ case REQ_OP_DISCARD:
+ lzbd_discard_rq(lzbd, q, bio);
+ break;
+ case REQ_OP_ZONE_RESET:
+ lzbd_zone_reset_rq(lzbd, q, bio);
+ break;
+ default:
+ pr_err("lzbd: unsupported operation: %d", bio_op(bio));
+ bio_io_error(bio);
+ break;
+ }
+
+ return BLK_QC_T_NONE;
+}
+
+int lzbd_report_zones(struct gendisk *disk, sector_t sector,
+ struct blk_zone *zones, unsigned int *nr_zones,
+ gfp_t gfp_mask)
+{
+ struct lzbd *lzbd = disk->private_data;
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ unsigned int max_zones = *nr_zones;
+ unsigned int reported = 0;
+ struct lzbd_zone *zone;
+
+ sector_div(sector, dl->zone_size);
+
+ while ((zone = lzbd_get_zone(lzbd, sector))) {
+ struct blk_zone *bz = &zone->blk_zone;
+
+ if (reported >= max_zones)
+ break;
+
+ memcpy(&zones[reported], bz, sizeof(*bz));
+
+ sector = sector + dl->zone_size;
+ reported++;
+ }
+
+ *nr_zones = reported;
+
+ return 0;
+}
diff --git a/drivers/lightnvm/lzbd-zone.c b/drivers/lightnvm/lzbd-zone.c
new file mode 100644
index 000000000000..813f7b006ef1
--- /dev/null
+++ b/drivers/lightnvm/lzbd-zone.c
@@ -0,0 +1,444 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * Zoned block device lightnvm target
+ * Copyright (C) 2019 CNEX Labs
+ *
+ * Internal zone handling
+ */
+
+#include "lzbd.h"
+
+static struct lzbd_chunk *lzbd_get_chunk(struct lzbd *lzbd, int pref_pu)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ int parallel_units = geo->all_luns;
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ struct lzbd_chunks *chunks = &lzbd->chunks;
+ int i = pref_pu;
+ int retries = dl->zone_chunks - 1;
+
+ do {
+ struct lzbd_pu *pu = &chunks->pus[i];
+ struct list_head *chk_list = &pu->chk_list;
+
+ mutex_lock(&pu->lock);
+
+ if (!list_empty(&pu->chk_list)) {
+ struct lzbd_chunk *chunk;
+
+ chunk = list_first_entry(chk_list,
+ struct lzbd_chunk, list);
+ list_del(&chunk->list);
+ mutex_unlock(&pu->lock);
+ return chunk;
+ }
+ mutex_unlock(&pu->lock);
+
+ if (++i == parallel_units)
+ i = 0;
+
+ } while (retries--);
+
+ return NULL;
+}
+
+void lzbd_zone_free_wr_buffer(struct lzbd_zone *zone)
+{
+ kfree(zone->wr_align.buffer);
+ zone->wr_align.buffer = NULL;
+ zone->wr_align.secs = 0;
+}
+
+static void lzbd_zone_deallocate(struct lzbd *lzbd, struct lzbd_zone *zone)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ struct lzbd_chunks *chunks = &lzbd->chunks;
+ int i;
+
+ if (!zone->chunks)
+ return;
+
+ for (i = 0; i < dl->zone_chunks; i++) {
+ struct lzbd_chunk *chunk = zone->chunks[i];
+
+ if (chunk) {
+ struct lzbd_pu *pu = &chunks->pus[chunk->pu];
+
+ mutex_lock(&pu->lock);
+
+ /* TODO: implement proper wear leveling
+ * The wear indices do not get updated right now
+ * so just add the chunk at the bottom of the list
+ */
+ list_add_tail(&chunk->list, &pu->chk_list);
+ mutex_unlock(&pu->lock);
+ }
+ }
+
+ lzbd_zone_free_wr_buffer(zone);
+ kfree(zone->chunks);
+ zone->chunks = NULL;
+}
+
+int lzbd_zone_allocate(struct lzbd *lzbd, struct lzbd_zone *zone)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ int to_allocate = dl->zone_chunks;
+ int i;
+
+ zone->chunks = kmalloc_array(to_allocate,
+ sizeof(struct lzbd_chunk *),
+ GFP_KERNEL | __GFP_ZERO);
+
+ if (!zone->chunks)
+ return -ENOMEM;
+
+ zone->wr_align.secs = 0;
+
+ zone->wr_align.buffer = kzalloc(geo->ws_opt << LZBD_SECTOR_BITS,
+ GFP_KERNEL);
+ if (!zone->wr_align.buffer) {
+ kfree(zone->chunks);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < to_allocate; i++) {
+ struct lzbd_chunk *chunk = lzbd_get_chunk(lzbd, i);
+
+ if (!chunk) {
+ pr_err("failed to allocate zone!\n");
+ lzbd_zone_deallocate(lzbd, zone);
+ return -ENOSPC;
+ }
+
+ zone->chunks[i] = chunk;
+ }
+
+ return 0;
+}
+
+static int lzbd_zone_reset_chunks(struct lzbd *lzbd, struct lzbd_zone *zone)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ int i = 0;
+
+ /* TODO: Do parallel resetting and handle reset failures */
+ for (i = 0; i < dl->zone_chunks; i++) {
+ struct lzbd_chunk *chunk = zone->chunks[i];
+ int state = chunk->meta->state;
+ int ret;
+
+ if (state & (NVM_CHK_ST_CLOSED | NVM_CHK_ST_OPEN)) {
+ ret = lzbd_reset_chunk(lzbd, chunk);
+ if (ret) {
+ pr_err("lzbd: reset failed!\n");
+ return -EIO; /* Fail for now if reset fails */
+ }
+ }
+ }
+
+ return 0;
+}
+
+int lzbd_zone_reset(struct lzbd *lzbd, struct lzbd_zone *zone)
+{
+ int ret;
+
+ lzbd_zone_deallocate(lzbd, zone);
+ ret = lzbd_zone_allocate(lzbd, zone);
+ if (ret)
+ return ret;
+
+ ret = lzbd_zone_reset_chunks(lzbd, zone);
+
+ zone->wi = 0;
+ atomic_set(&zone->s_wp, 0);
+
+ return ret;
+}
+
+
+static void lzbd_add_to_align_buf(struct lzbd_wr_align *wr_align,
+ struct bio *bio, int secs)
+{
+ char *buffer = wr_align->buffer;
+
+ buffer += (wr_align->secs * LZBD_SECTOR_SIZE);
+
+ mutex_lock(&wr_align->lock);
+ while (secs--) {
+ char *data = bio_data(bio);
+
+ memcpy(buffer, data, LZBD_SECTOR_SIZE);
+ buffer += LZBD_SECTOR_SIZE;
+ wr_align->secs++;
+ bio_advance(bio, LZBD_SECTOR_SIZE);
+
+ }
+
+ mutex_unlock(&wr_align->lock);
+}
+
+static void lzbd_read_from_align_buf(struct lzbd_wr_align *wr_align,
+ struct bio *bio, int start, int secs)
+{
+ char *buffer = wr_align->buffer;
+
+ buffer += (start * LZBD_SECTOR_SIZE);
+
+ mutex_lock(&wr_align->lock);
+ while (secs--) {
+ char *data = bio_data(bio);
+
+ memcpy(data, buffer, LZBD_SECTOR_SIZE);
+ buffer += LZBD_SECTOR_SIZE;
+
+ bio_advance(bio, LZBD_SECTOR_SIZE);
+ }
+
+ mutex_unlock(&wr_align->lock);
+}
+
+int lzbd_zone_write(struct lzbd *lzbd, struct lzbd_zone *zone, struct bio *bio)
+{
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ struct lzbd_wr_align *wr_align = &zone->wr_align;
+ int sectors_left = lzbd_get_bio_len(bio);
+ int ret;
+
+ /* Unaligned write? */
+ if (wr_align->secs) {
+ int secs;
+
+ secs = min_t(int, geo->ws_opt - wr_align->secs, sectors_left);
+ lzbd_add_to_align_buf(wr_align, bio, secs);
+ sectors_left -= secs;
+
+ /* Time to flush the alignment buffer ? */
+ if (wr_align->secs == geo->ws_opt) {
+ struct bio *bio;
+
+ bio = bio_map_kern(dev->q, wr_align->buffer,
+ geo->ws_opt * LZBD_SECTOR_SIZE,
+ GFP_KERNEL);
+ if (!bio) {
+ pr_err("lzbd: failed to map align bio\n");
+ return -EIO;
+ }
+
+ ret = lzbd_write_to_chunk_user(lzbd,
+ zone->chunks[zone->wi], bio);
+
+ if (ret) {
+ pr_err("lzbd: alignment write failed\n");
+ return sectors_left;
+ }
+
+ wr_align->secs = 0;
+ zone->wi = (zone->wi + 1) % dl->zone_chunks;
+ atomic_add(geo->ws_opt, &zone->s_wp);
+ }
+ }
+
+ if (sectors_left == 0) {
+ bio_endio(bio);
+ return 0;
+ }
+
+ while (sectors_left > geo->ws_opt) {
+ struct bio *split;
+
+ split = bio_split(bio, geo->ws_opt << 3,
+ GFP_KERNEL, &lzbd_bio_set);
+
+ if (split == NULL) {
+ pr_err("lzbd: split failed!\n");
+ return sectors_left;
+ }
+
+ ret = lzbd_write_to_chunk_user(lzbd,
+ zone->chunks[zone->wi], split);
+
+ if (ret)
+ return sectors_left;
+
+ zone->wi = (zone->wi + 1) % dl->zone_chunks;
+ atomic_add(geo->ws_opt, &zone->s_wp);
+
+ sectors_left -= geo->ws_opt;
+ }
+
+ if (sectors_left == geo->ws_opt) {
+ ret = lzbd_write_to_chunk_user(lzbd,
+ zone->chunks[zone->wi], bio);
+ if (ret) {
+ pr_err("lzbd: last aligned write failed\n");
+ return sectors_left;
+ }
+
+ zone->wi = (zone->wi + 1) % dl->zone_chunks;
+ atomic_add(geo->ws_opt, &zone->s_wp);
+ sectors_left -= geo->ws_opt;
+ } else {
+ wr_align->secs = 0;
+ lzbd_add_to_align_buf(wr_align, bio, sectors_left);
+ bio_endio(bio);
+ sectors_left = 0;
+ }
+
+ return sectors_left;
+}
+
+void lzbd_user_read_put(struct kref *ref)
+{
+ struct lzbd_user_read *read;
+
+ read = container_of(ref, struct lzbd_user_read, ref);
+
+ if (unlikely(read->error))
+ bio_io_error(read->user_bio);
+ else
+ bio_endio(read->user_bio);
+
+ kfree(read);
+}
+
+
+static struct lzbd_user_read *lzbd_init_user_read(struct bio *bio)
+{
+ struct lzbd_user_read *rd;
+
+ rd = kmalloc(sizeof(struct lzbd_user_read), GFP_KERNEL);
+ if (!rd)
+ return NULL;
+
+ rd->user_bio = bio;
+ kref_init(&rd->ref);
+ rd->error = false;
+
+ return rd;
+}
+
+
+int lzbd_zone_read(struct lzbd *lzbd, struct lzbd_zone *zone, struct bio *bio)
+{
+ struct lzbd_disk_layout *dl = &lzbd->disk_layout;
+ struct nvm_tgt_dev *dev = lzbd->dev;
+ struct nvm_geo *geo = &dev->geo;
+ struct blk_zone *bz = &zone->blk_zone;
+ struct lzbd_chunk *read_chunk;
+ sector_t lba = lzbd_get_bio_lba(bio);
+ int to_read = lzbd_get_bio_len(bio);
+ struct lzbd_user_read *read;
+ int readsize;
+ int zsi, zso, csi, co;
+ int pu;
+ int ret;
+
+ read = lzbd_init_user_read(bio);
+ if (!read) {
+ pr_err("lzbd: failed to init read\n");
+ bio_io_error(bio);
+ return -EIO;
+ }
+
+ if (!zone->chunks) {
+ /* No data has been written to this zone */
+ zero_fill_bio(bio);
+ bio_endio(bio);
+ kfree(read);
+ return 0;
+ }
+
+ lba -= bz->start >> 3;
+
+ /* TODO: use sector_div instead */
+
+ /* Zone stripe index and offset */
+ zsi = lba / geo->ws_opt; /* zone stripe index */
+ zso = lba % geo->ws_opt; /* zone stripe offset */
+
+ pu = zsi % dl->zone_chunks;
+ read_chunk = zone->chunks[pu];
+
+ /* Chunk stripe index and chunk offset */
+ csi = lba / (dl->zone_chunks * geo->ws_opt);
+ co = csi * geo->ws_opt + zso;
+
+ readsize = min_t(int, geo->ws_opt - zso, to_read);
+
+ while (to_read > 0) {
+ struct bio *rbio = bio;
+ int s_wp = atomic_read(&zone->s_wp);
+
+ if (lba >= s_wp) {
+ /* Grab the write lock to prevent races
+ * with writes
+ */
+ mutex_lock(&zone->lock);
+ if (lba >= atomic_read(&zone->s_wp)) {
+ lzbd_read_from_align_buf(&zone->wr_align, bio,
+ zso, to_read);
+ mutex_unlock(&zone->lock);
+ ret = 0;
+ goto done;
+ }
+ mutex_unlock(&zone->lock);
+ }
+
+ if ((zso + to_read) > geo->ws_opt) {
+
+ rbio = bio_split(bio, readsize << 3, GFP_KERNEL,
+ &lzbd_bio_set);
+
+ if (!rbio) {
+ read->error = true;
+ ret = -EIO;
+ goto done;
+ }
+
+ }
+
+ if (lba + to_read >= s_wp)
+ readsize = s_wp - lba;
+
+ kref_get(&read->ref);
+ ret = lzbd_read_from_chunk_user(lzbd, zone->chunks[pu],
+ rbio, read, co);
+ if (ret) {
+ pr_err("lzbd: user disk read failed!\n");
+ read->error = true;
+ kref_put(&read->ref, lzbd_user_read_put);
+ ret = -EIO;
+ goto done;
+ }
+
+ lba += readsize;
+
+ if (zso) {
+ co -= zso;
+ zso = 0;
+ }
+
+ if (++pu == dl->zone_chunks) {
+ pu = 0;
+ co += geo->ws_opt;
+ }
+
+ to_read -= readsize;
+ readsize = min_t(int, geo->ws_opt, to_read);
+ read_chunk = zone->chunks[pu];
+ }
+
+ ret = 0;
+done:
+ kref_put(&read->ref, lzbd_user_read_put);
+ return ret;
+}
+
diff --git a/drivers/lightnvm/lzbd.h b/drivers/lightnvm/lzbd.h
new file mode 100644
index 000000000000..97cca99a49bf
--- /dev/null
+++ b/drivers/lightnvm/lzbd.h
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ *
+ * Zoned block device lightnvm target
+ * Copyright (C) 2019 CNEX Labs
+ *
+ */
+
+#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
+#include <linux/bio.h>
+#include <linux/lightnvm.h>
+
+#define LZBD_SECTOR_BITS (12) /* 4096 */
+#define LZBD_SECTOR_SIZE (4096UL)
+
+/* sector unit to lzbd sector shift*/
+#define LZBD_SECTOR_SHIFT (3)
+
+extern struct bio_set lzbd_bio_set;
+
+
+/* Get length, in lzbd sectors, of bio */
+static inline sector_t lzbd_get_bio_len(struct bio *bio)
+{
+ return bio->bi_iter.bi_size >> LZBD_SECTOR_BITS;
+}
+
+/* Get bio start lba in lzbd sectors */
+static inline sector_t lzbd_get_bio_lba(struct bio *bio)
+{
+ return bio->bi_iter.bi_sector >> LZBD_SECTOR_SHIFT;
+}
+
+struct lzbd_wr_ctx {
+ struct lzbd *lzbd;
+ struct mutex wr_lock; /* Max one outstanding write */
+
+ void *private;
+ /* bio completion list goes here, along with lock*/
+};
+
+struct lzbd_user_read {
+ struct bio *user_bio;
+ struct kref ref;
+ bool error;
+};
+
+struct lzbd_rd_ctx {
+ struct lzbd *lzbd;
+ struct lzbd_user_read *read;
+ struct nvm_rq rqd;
+};
+
+struct lzbd_chunk {
+ struct nvm_chk_meta *meta; /* Metadata for the chunk */
+ struct ppa_addr ppa; /* Start ppa */
+ int pu; /* Parallel unit */
+
+ struct lzbd_wr_ctx wr_ctx;
+ struct list_head list; /* A chunk is offline or
+ * part of a PU free list or
+ * part of a zone chunk list or
+ * part of a metadata list
+ */
+
+ /* a cuinits buffer should go here */
+};
+
+struct lzbd_pu {
+ struct list_head chk_list; /* One list per parallel unit */
+ struct mutex lock; /* Protecting list */
+ int offline_chks;
+};
+
+struct lzbd_chunks {
+ struct lzbd_pu *pus; /* Chunks organized per parallel unit*/
+ struct nvm_chk_meta *meta; /* Metadata for all chunks */
+};
+
+struct lzbd_wr_align {
+ void *buffer; /* Buffer data */
+ int secs; /* Number of 4k secs in buffer */
+ struct mutex lock;
+};
+
+struct lzbd_zone {
+ struct blk_zone blk_zone;
+ struct lzbd_chunk **chunks;
+
+ int wi; /* Write chunk index */
+ atomic_t s_wp; /* Sync write pointer */
+
+ struct lzbd_wr_align wr_align; /* Write alignment buffer */
+
+ struct mutex lock; /* Write lock */
+};
+
+struct lzbd_disk_layout {
+ int op; /* Over provision ratio */
+ int meta_chunks; /* Metadata chunks */
+
+ int zones; /* Number of zones */
+ int zone_chunks; /* Zone per chunk */
+ sector_t zone_size; /* Number of 512b sectors per zone */
+
+ sector_t capacity; /* Disk capacity in 512b sectors */
+};
+
+struct lzbd {
+ struct nvm_tgt_dev *dev;
+ struct gendisk *disk;
+
+ struct lzbd_zone *zones;
+
+ struct lzbd_chunks chunks;
+ struct lzbd_disk_layout disk_layout;
+};
+
+blk_qc_t lzbd_make_rq(struct request_queue *q, struct bio *bio);
+
+int lzbd_report_zones(struct gendisk *disk, sector_t sector,
+ struct blk_zone *zones, unsigned int *nr_zones,
+ gfp_t gfp_mask);
+
+int lzbd_reset_chunk(struct lzbd *lzbd, struct lzbd_chunk *chunk);
+int lzbd_write_to_chunk_sync(struct lzbd *lzbd, struct lzbd_chunk *chunk,
+ struct bio *bio);
+int lzbd_write_to_chunk_user(struct lzbd *lzbd, struct lzbd_chunk *chunk,
+ struct bio *user_bio);
+int lzbd_read_from_chunk_user(struct lzbd *lzbd, struct lzbd_chunk *chunk,
+ struct bio *bio, struct lzbd_user_read *user_read,
+ int start);
+int lzbd_zone_reset(struct lzbd *lzbd, struct lzbd_zone *zone);
+int lzbd_zone_write(struct lzbd *lzbd, struct lzbd_zone *zone, struct bio *bio);
+int lzbd_zone_read(struct lzbd *lzbd, struct lzbd_zone *zone, struct bio *bio);
+void lzbd_zone_free_wr_buffer(struct lzbd_zone *zone);
+void lzbd_user_read_put(struct kref *ref);
+
--
2.7.4

Next message: tip-bot for Arnd Bergmann: "[tip:locking/core] locking/lockdep: Avoid bogus Clang warning"
Previous message: hans: "[RFC PATCH 0/1] Introduce a new target: lzbd - LightNVM Zoned Block Device"
In reply to: hans: "[RFC PATCH 0/1] Introduce a new target: lzbd - LightNVM Zoned Block Device"
Next in thread: Randy Dunlap: "Re: [RFC PATCH 1/1] lightnvm: add lzbd - a zoned block device target"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]