[RFC/PATCH 1/5 v2] mtd: ubi: Read disturb infrastructure

From: Tanya Brokhman
Date: Sun Oct 26 2014 - 09:49:55 EST


The need for performing read disturb is determined according to new
statistics collected per eraseblock:
- read counter: incremented at each read operation
reset at each erase
- last erase time stamp: updated at each erase

This patch adds the infrastructure for the above statistics

Signed-off-by: Tanya Brokhman <tlinder@xxxxxxxxxxxxxx>
---

Changes from V1:
- Documentation file was added


Documentation/mtd/ubi/ubi-read-disturb.txt | 145 +++++++++++++++++++++++++++++
drivers/mtd/ubi/build.c | 57 ++++++++++++
drivers/mtd/ubi/fastmap.c | 14 ++-
drivers/mtd/ubi/ubi-media.h | 32 ++++++-
drivers/mtd/ubi/ubi.h | 34 +++++++
drivers/mtd/ubi/wl.c | 6 ++
6 files changed, 280 insertions(+), 8 deletions(-)
create mode 100644 Documentation/mtd/ubi/ubi-read-disturb.txt

diff --git a/Documentation/mtd/ubi/ubi-read-disturb.txt b/Documentation/mtd/ubi/ubi-read-disturb.txt
new file mode 100644
index 0000000..4d3efef
--- /dev/null
+++ b/Documentation/mtd/ubi/ubi-read-disturb.txt
@@ -0,0 +1,145 @@
+
+1. Introduction
+===============
+Raw NAND flash memories are one of the most common storage devices in present
+day embedded systems. The most common devices in which one can find raw NAND
+flash cards in are mobile phones.
+One of the limitations of the NAND devices is the method used to read NAND
+flash memory may cause bit-flips on the surrounding cells and result in
+uncorrectable ECC errors. This is known as the read disturb or data retention
+failure.
+Todayâs Linux NAND drivers implementation doesnât address the read disturb and
+the data retention limitations of the NAND devices.
+
+
+2. The problem
+==============
+There are two characteristics of the raw NAND that are not addressed by the
+NAND driver at the moment:
+
+2.1 Read Disturb
+----------------
+The method used to read NAND flash memory can cause nearby cells in the same
+memory block to change their value over time (become programmed). This
+phenomenon is known as read disturb. The threshold number of reads that leads
+to this issue is generally in the hundreds of thousands between intervening
+erase operations. When reading continuously from one cell, that cell will not
+fail but rather one of the surrounding cells may fail on a subsequent read. If
+read disturb is not addressed, there is a high possibility of data loss - if
+the errors are too numerous to correct.
+
+2.2 Data Retention
+------------------
+Another NAND flash limitation is Data Retention (of rarely accessed blocks).
+The ability of the NAND device to remain in its programmed state decreases over
+time.
+
+To date these issues could be overlooked since the possibility of their
+occurrence in todayâs NAND devices is very low. With the evolution of NAND
+devices and the requirement for a âlong lifeâ NAND flash, read disturb and data
+retention can no longer be ignored otherwise there will be data loss over time.
+
+
+3. The Solution
+===============
+Handling both of the described above types of blocks (read disturb and data
+retention) is done by means of scrubbing. Scrubbing in essence is:
+- Copy the data from block X to new block Y
+- Erase block X
+
+3.1 Handling Read disturb blocks
+--------------------------------
+3.1.1 Identification
+In order to identify potential read-disturb blocks, a read counter is
+maintained per each PEB. The read counter is incremented as part of each read
+operation, and is reset in every erase operation.
+In each read operation the read counter is verified. This counter is also
+verified at initiation phase, when attaching UBI to an MTD device.
+
+3.1.2 Saving on NAND
+Due to the physical characteristics of the NAND flash memory, write operations
+can only be performed on an erased block. Due to this, the read counter canât
+be saved as part of the meta-data that is saved on flash per each erase block,
+and therefore can exist only in RAM. Once we power off the device, the read
+counter will no longer be valid. In order to overcome this issue and to save
+the read counterâs value through reboots of the system, it is saved as part of
+the fastmap data on the flash.
+
+3.1.3 Error recovery
+It is possible that the fastmap data wonât be valid on boot up - for example if
+a sudden power cut occurred. In such case a default value will be assigned to
+each PEB. The default value for the read counter will be assigned as follows:
+- Free erase blocks: Itâs safe to assume that the read counter for free
+ blocks was 0 prior to the power off since a block is marked as âfreeâ
+ after it was erased. Such blocks will be assigned read counter 0.
+- Allocated erase blocks: We can make no assumptions on the amount of
+ reads performed on allocated data blocks. To be on the safe side the
+ default read counter assigned to these blocks is the
+ read_disturb_threshold/2.
+
+3.1.4 Enhancements to Fastmap (work in progress)
+In order to lower the possibility of fastmap being invalid on boot up we
+increase the pool of events which trigger the fastmap data being saved on
+flash. A global read counter is maintained per UBI device. It is incremented as
+part of each read operation that is performed on any of the device PEBs. When
+a pre-defined threshold is reached, a fastmap flush will be scheduled. This
+counter is reset on each flush of the fastmap data.
+
+3.1.2 "Fixing" the Read disturbed blocks
+If the read counter reaches a pre-defined threshold the block will be scheduled
+for scrubbing.
+
+
+3.2 Data Retention blocks
+-------------------------
+3.2.1 Identification
+In order to identify rarely accessed blocks a âlast erase timestampâ is
+maintained per PEB. The resolution of this timestamp is in days and it is
+updated during each erase operation performed on a PEB.
+This timestamp is verified at initiation phase, when attaching UBI to an MTD
+device. If the delta between time of verification and the last_erase_timestamp
+is higher than a pre-defined threshold, the PEB will be scheduled for
+scrubbing.
+In order to identify data retention blocks, an outside intervention is required
+in form of a user space application. This app will be periodically activated by
+the user and will trigger the scanning of all of the flash PEBs and the
+verification of the last erase timestamp of each PEB against a pre-defined
+threshold.
+When activating the user space utility, one should keep in mind that this
+process will take some time. As a result the recommendation for it to be
+activated during device idle time.
+
+3.2.2 Saving on NAND
+The last erase timestamp is saved as part of the PEB meta-data on NAND, per
+each PEB. It is saved as part of the fastmap meta-data as well. In case no
+fastmap is available, it will be retrieved from the PEB meta saved on flash.
+If itâs missing on the flash as well, a default value equaling the average of
+erase timestamps of other PEBs of the device, will be assigned.
+
+
+4. Backward compatibility of the proposed solution
+==================================================
+As mentioned before, read counters can only be saved as part of the fastmap
+meta-data. Since the fastmap layout changes a new fastmap version is defined,
+one that supports Read disturb meta data.
+When loading an older image, which doesnât support read disturb, the fastmap
+(if present) will be found invalid and the attach process will trigger the
+scanning the whole device. A default read counter will be assigned to the PEB,
+as described in section 3.1.3.
+The default last erase timestamp will be set according to the average timestamp
+of all PEBs of the device. In case of an old image, where no last erase
+timestamp present, a default value of last_erase_timestamp_threshold/2 will
+be assigned.
+
+
+5. Conclusions
+==============
+The described solution addresses both the read disturb and the data retention
+issues, thereby allowing a long life usage for NAND devices.
+The downside of the proposed solution is that the meta-data increases, and as
+a result the size of the fastmap data also increases.
+In our testing no performance impact was observed since the verification or
+saving of the counters/timestamp is performed in O(1).
+The solution above is implemented with minimal possible code changes since it
+reuses the - already implemented - scrubbing mechanism used in UBI wear
+leveling subsystem.
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index 6e30a3c..34fe23a 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -1,6 +1,9 @@
/*
* Copyright (c) International Business Machines Corp., 2006
* Copyright (c) Nokia Corporation, 2007
+ * Copyright (c) 2014, Linux Foundation. All rights reserved.
+ * Linux Foundation chooses to take subject only to the GPLv2
+ * license terms, and distributes only under these terms.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -118,6 +121,10 @@ static struct class_attribute ubi_version =
static ssize_t dev_attribute_show(struct device *dev,
struct device_attribute *attr, char *buf);

+static ssize_t dev_attribute_store(struct device *dev,
+ struct device_attribute *attr, const char *buf,
+ size_t count);
+
/* UBI device attributes (correspond to files in '/<sysfs>/class/ubi/ubiX') */
static struct device_attribute dev_eraseblock_size =
__ATTR(eraseblock_size, S_IRUGO, dev_attribute_show, NULL);
@@ -141,6 +148,12 @@ static struct device_attribute dev_bgt_enabled =
__ATTR(bgt_enabled, S_IRUGO, dev_attribute_show, NULL);
static struct device_attribute dev_mtd_num =
__ATTR(mtd_num, S_IRUGO, dev_attribute_show, NULL);
+static struct device_attribute dev_dt_threshold =
+ __ATTR(dt_threshold, (S_IWUSR | S_IRUGO), dev_attribute_show,
+ dev_attribute_store);
+static struct device_attribute dev_rd_threshold =
+ __ATTR(rd_threshold, (S_IWUSR | S_IRUGO), dev_attribute_show,
+ dev_attribute_store);

/**
* ubi_volume_notify - send a volume change notification.
@@ -378,6 +391,10 @@ static ssize_t dev_attribute_show(struct device *dev,
ret = sprintf(buf, "%d\n", ubi->thread_enabled);
else if (attr == &dev_mtd_num)
ret = sprintf(buf, "%d\n", ubi->mtd->index);
+ else if (attr == &dev_dt_threshold)
+ ret = sprintf(buf, "%d\n", ubi->dt_threshold);
+ else if (attr == &dev_rd_threshold)
+ ret = sprintf(buf, "%d\n", ubi->rd_threshold);
else
ret = -EINVAL;

@@ -385,6 +402,38 @@ static ssize_t dev_attribute_show(struct device *dev,
return ret;
}

+static ssize_t dev_attribute_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ int value;
+ struct ubi_device *ubi;
+
+ ubi = container_of(dev, struct ubi_device, dev);
+ ubi = ubi_get_device(ubi->ubi_num);
+ if (!ubi)
+ return -ENODEV;
+
+ if (kstrtos32(buf, 10, &value))
+ return -EINVAL;
+ /* Consider triggering full scan if threshods change */
+ else if (attr == &dev_dt_threshold) {
+ if (value < UBI_MAX_DT_THRESHOLD)
+ ubi->dt_threshold = value;
+ else
+ pr_err("Max supported threshold value is %d",
+ UBI_MAX_DT_THRESHOLD);
+ } else if (attr == &dev_rd_threshold) {
+ if (value < UBI_MAX_READCOUNTER)
+ ubi->rd_threshold = value;
+ else
+ pr_err("Max supported threshold value is %d",
+ UBI_MAX_READCOUNTER);
+ }
+
+ return count;
+}
+
static void dev_release(struct device *dev)
{
struct ubi_device *ubi = container_of(dev, struct ubi_device, dev);
@@ -445,6 +494,12 @@ static int ubi_sysfs_init(struct ubi_device *ubi, int *ref)
if (err)
return err;
err = device_create_file(&ubi->dev, &dev_mtd_num);
+ if (err)
+ return err;
+ err = device_create_file(&ubi->dev, &dev_dt_threshold);
+ if (err)
+ return err;
+ err = device_create_file(&ubi->dev, &dev_rd_threshold);
return err;
}

@@ -455,6 +510,8 @@ static int ubi_sysfs_init(struct ubi_device *ubi, int *ref)
static void ubi_sysfs_close(struct ubi_device *ubi)
{
device_remove_file(&ubi->dev, &dev_mtd_num);
+ device_remove_file(&ubi->dev, &dev_dt_threshold);
+ device_remove_file(&ubi->dev, &dev_rd_threshold);
device_remove_file(&ubi->dev, &dev_bgt_enabled);
device_remove_file(&ubi->dev, &dev_min_io_size);
device_remove_file(&ubi->dev, &dev_max_vol_count);
diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 0431b46..5399aa2 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -1,5 +1,7 @@
/*
* Copyright (c) 2012 Linutronix GmbH
+ * Copyright (c) 2014, Linux Foundation. All rights reserved.
+ *
* Author: Richard Weinberger <richard@xxxxxx>
*
* This program is free software; you can redistribute it and/or modify
@@ -727,9 +729,9 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
}

for (j = 0; j < be32_to_cpu(fm_eba->reserved_pebs); j++) {
- int pnum = be32_to_cpu(fm_eba->pnum[j]);
+ int pnum = be32_to_cpu(fm_eba->peb_data[j].pnum);

- if ((int)be32_to_cpu(fm_eba->pnum[j]) < 0)
+ if ((int)be32_to_cpu(fm_eba->peb_data[j].pnum) < 0)
continue;

aeb = NULL;
@@ -757,7 +759,8 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
}

aeb->lnum = j;
- aeb->pnum = be32_to_cpu(fm_eba->pnum[j]);
+ aeb->pnum =
+ be32_to_cpu(fm_eba->peb_data[j].pnum);
aeb->ec = -1;
aeb->scrub = aeb->copy_flag = aeb->sqnum = 0;
list_add_tail(&aeb->u.list, &eba_orphans);
@@ -1250,11 +1253,12 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
vol->vol_type == UBI_STATIC_VOLUME);

feba = (struct ubi_fm_eba *)(fm_raw + fm_pos);
- fm_pos += sizeof(*feba) + (sizeof(__be32) * vol->reserved_pebs);
+ fm_pos += sizeof(*feba) +
+ 2 * (sizeof(__be32) * vol->reserved_pebs);
ubi_assert(fm_pos <= ubi->fm_size);

for (j = 0; j < vol->reserved_pebs; j++)
- feba->pnum[j] = cpu_to_be32(vol->eba_tbl[j]);
+ feba->peb_data[j].pnum = cpu_to_be32(vol->eba_tbl[j]);

feba->reserved_pebs = cpu_to_be32(j);
feba->magic = cpu_to_be32(UBI_FM_EBA_MAGIC);
diff --git a/drivers/mtd/ubi/ubi-media.h b/drivers/mtd/ubi/ubi-media.h
index ac2b24d..da418ad 100644
--- a/drivers/mtd/ubi/ubi-media.h
+++ b/drivers/mtd/ubi/ubi-media.h
@@ -1,5 +1,8 @@
/*
* Copyright (c) International Business Machines Corp., 2006
+ * Copyright (c) 2014, Linux Foundation. All rights reserved.
+ * Linux Foundation chooses to take subject only to the GPLv2
+ * license terms, and distributes only under these terms.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -38,6 +41,15 @@
/* The highest erase counter value supported by this implementation */
#define UBI_MAX_ERASECOUNTER 0x7FFFFFFF

+/* The highest read counter value supported by this implementation */
+#define UBI_MAX_READCOUNTER 0x7FFFFFFD /* (0x7FFFFFFF - 2)*/
+
+/*
+ * The highest data retention threshold value supported
+ * by this implementation
+ */
+#define UBI_MAX_DT_THRESHOLD 0x7FFFFFFF
+
/* The initial CRC32 value used when calculating CRC checksums */
#define UBI_CRC32_INIT 0xFFFFFFFFU

@@ -130,6 +142,7 @@ enum {
* @vid_hdr_offset: where the VID header starts
* @data_offset: where the user data start
* @image_seq: image sequence number
+ * @last_erase_time: time stamp of the last erase operation
* @padding2: reserved for future, zeroes
* @hdr_crc: erase counter header CRC checksum
*
@@ -162,7 +175,8 @@ struct ubi_ec_hdr {
__be32 vid_hdr_offset;
__be32 data_offset;
__be32 image_seq;
- __u8 padding2[32];
+ __be64 last_erase_time; /*curr time in sec == unsigned long time_t*/
+ __u8 padding2[24];
__be32 hdr_crc;
} __packed;

@@ -413,6 +427,8 @@ struct ubi_vtbl_record {
* @used_blocks: number of PEBs used by this fastmap
* @block_loc: an array containing the location of all PEBs of the fastmap
* @block_ec: the erase counter of each used PEB
+ * @block_rc: the read counter of each used PEB
+ * @block_let: the last erase timestamp of each used PEB
* @sqnum: highest sequence number value at the time while taking the fastmap
*
*/
@@ -424,6 +440,8 @@ struct ubi_fm_sb {
__be32 used_blocks;
__be32 block_loc[UBI_FM_MAX_BLOCKS];
__be32 block_ec[UBI_FM_MAX_BLOCKS];
+ __be32 block_rc[UBI_FM_MAX_BLOCKS];
+ __be64 block_let[UBI_FM_MAX_BLOCKS];
__be64 sqnum;
__u8 padding2[32];
} __packed;
@@ -469,13 +487,17 @@ struct ubi_fm_scan_pool {
/* ubi_fm_scan_pool is followed by nfree+nused struct ubi_fm_ec records */

/**
- * struct ubi_fm_ec - stores the erase counter of a PEB
+ * struct ubi_fm_ec - stores the erase/read counter of a PEB
* @pnum: PEB number
* @ec: ec of this PEB
+ * @rc: rc of this PEB
+ * @last_erase_time: last erase time stamp of this PEB
*/
struct ubi_fm_ec {
__be32 pnum;
__be32 ec;
+ __be32 rc;
+ __be64 last_erase_time;
} __packed;

/**
@@ -506,10 +528,14 @@ struct ubi_fm_volhdr {
* @magic: EBA table magic number
* @reserved_pebs: number of table entries
* @pnum: PEB number of LEB (LEB is the index)
+ * @rc: Read counter of the LEBs PEB (LEB is the index)
*/
struct ubi_fm_eba {
__be32 magic;
__be32 reserved_pebs;
- __be32 pnum[0];
+ struct {
+ __be32 pnum;
+ __be32 rc;
+ } peb_data[0];
} __packed;
#endif /* !__UBI_MEDIA_H__ */
diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h
index 7bf4163..6c7e53e 100644
--- a/drivers/mtd/ubi/ubi.h
+++ b/drivers/mtd/ubi/ubi.h
@@ -1,6 +1,9 @@
/*
* Copyright (c) International Business Machines Corp., 2006
* Copyright (c) Nokia Corporation, 2006, 2007
+ * Copyright (c) 2014, Linux Foundation. All rights reserved.
+ * Linux Foundation chooses to take subject only to the GPLv2
+ * license terms, and distributes only under these terms.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -84,6 +87,22 @@
#define UBI_UNKNOWN -1

/*
+ * This parameter defines the maximum read counter of eraseblocks
+ * of UBI devices. When this threshold is exceeded, UBI starts performing
+ * wear leveling by means of moving data from eraseblock with low erase
+ * counter to eraseblocks with high erase counter.
+ */
+#define UBI_RD_THRESHOLD 100000
+
+/*
+ * This parameter defines the maximun interval (in days) between two
+ * erasures of an eraseblock. When this interval is reached, UBI starts
+ * performing wear leveling by means of moving data from eraseblock with
+ * low erase counter to eraseblocks with high erase counter.
+ */
+#define UBI_DT_THRESHOLD 120
+
+/*
* The UBI debugfs directory name pattern and maximum name length (3 for "ubi"
* + 2 for the number plus 1 for the trailing zero byte.
*/
@@ -155,6 +174,8 @@ enum {
* @u.rb: link in the corresponding (free/used) RB-tree
* @u.list: link in the protection queue
* @ec: erase counter
+ * @last_erase_time: time stamp of the last erase opp
+ * @rc: read counter
* @pnum: physical eraseblock number
*
* This data structure is used in the WL sub-system. Each physical eraseblock
@@ -167,6 +188,8 @@ struct ubi_wl_entry {
struct list_head list;
} u;
int ec;
+ long last_erase_time;
+ int rc;
int pnum;
};

@@ -451,6 +474,10 @@ struct ubi_debug_info {
* @bgt_thread: background thread description object
* @thread_enabled: if the background thread is enabled
* @bgt_name: background thread name
+ * @rd_threshold: read counter threshold See UBI_RD_THRESHOLD
+ * for more info
+ * @dt_threshold: data retention threshold. See UBI_DT_THRESHOLD
+ * for more info
*
* @flash_size: underlying MTD device size (in bytes)
* @peb_count: count of physical eraseblocks on the MTD device
@@ -553,6 +580,9 @@ struct ubi_device {
struct task_struct *bgt_thread;
int thread_enabled;
char bgt_name[sizeof(UBI_BGT_NAME_PATTERN)+2];
+ int rd_threshold;
+ int dt_threshold;
+

/* I/O sub-system's stuff */
long long flash_size;
@@ -588,6 +618,8 @@ struct ubi_device {
/**
* struct ubi_ainf_peb - attach information about a physical eraseblock.
* @ec: erase counter (%UBI_UNKNOWN if it is unknown)
+ * @rc: read counter (%UBI_UNKNOWN if it is unknown)
+ * @last_erase_time: last erase time stamp (%UBI_UNKNOWN if it is unknown)
* @pnum: physical eraseblock number
* @vol_id: ID of the volume this LEB belongs to
* @lnum: logical eraseblock number
@@ -604,6 +636,8 @@ struct ubi_device {
*/
struct ubi_ainf_peb {
int ec;
+ int rc;
+ long last_erase_time;
int pnum;
int vol_id;
int lnum;
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index 20f4917..33d33e43 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -1,5 +1,8 @@
/*
* Copyright (c) International Business Machines Corp., 2006
+ * Copyright (c) 2014, Linux Foundation. All rights reserved.
+ * Linux Foundation chooses to take subject only to the GPLv2
+ * license terms, and distributes only under these terms.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -1898,6 +1901,9 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
INIT_LIST_HEAD(&ubi->pq[i]);
ubi->pq_head = 0;

+ ubi->rd_threshold = UBI_RD_THRESHOLD;
+ ubi->dt_threshold = UBI_DT_THRESHOLD;
+
list_for_each_entry_safe(aeb, tmp, &ai->erase, u.list) {
cond_resched();

--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/