Re: [PATCH v7] mtd: rawnand: use bit-wise majority to recover the contents of ONFI parameter

From: Chris Moore
Date: Wed May 16 2018 - 23:56:12 EST


Hi,

Le 16/05/2018 Ã 09:56, Boris Brezillon a ÃcritÂ:
On Wed, 16 May 2018 09:32:57 +0200
Chris Moore <moore@xxxxxxx> wrote:

Hi,

Le 15/05/2018 Ã 09:34, Boris Brezillon a ÃcritÂ:
On Tue, 15 May 2018 06:45:51 +0200
Chris Moore <moore@xxxxxxx> wrote:
Hi,

Le 13/05/2018 Ã 06:30, Wan, Jane (Nokia - US/Sunnyvale) a ÃcritÂ:
Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC values, the bit-wise majority may be used to recover the contents of the parameter pages from the parameter page copies present.

Signed-off-by: Jane Wan <Jane.Wan@xxxxxxxxx>
---
v7: change debug print messages
v6: support the cases that srcbufs are not contiguous
v5: make the bit-wise majority functon generic
v4: move the bit-wise majority code in a separate function
v3: fix warning message detected by kbuild test robot
v2: rebase the changes on top of v4.17-rc1
drivers/mtd/nand/raw/nand_base.c | 50 ++++++++++++++++++++++++++++++++++----
1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 72f3a89..b43b784 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -5087,6 +5087,35 @@ static int nand_flash_detect_ext_param_page(struct nand_chip *chip,
}
/*
+ * Recover data with bit-wise majority
+ */
+static void nand_bit_wise_majority(const void **srcbufs,
+ unsigned int nsrcbufs,
+ void *dstbuf,
+ unsigned int bufsize)
+{
+ int i, j, k;
+
+ for (i = 0; i < bufsize; i++) {
+ u8 cnt, val;
+
+ val = 0;
+ for (j = 0; j < 8; j++) {
+ cnt = 0;
+ for (k = 0; k < nsrcbufs; k++) {
+ const u8 *srcbuf = srcbufs[k];
+
+ if (srcbuf[i] & BIT(j))
+ cnt++;
+ }
+ if (cnt > nsrcbufs / 2)
+ val |= BIT(j);
+ }
+ ((u8 *)dstbuf)[i] = val;
+ }
+}
+
+/*
* Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 otherwise.
*/
static int nand_flash_detect_onfi(struct nand_chip *chip)
@@ -5102,7 +5131,7 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)
return 0;
/* ONFI chip: allocate a buffer to hold its parameter page */
- p = kzalloc(sizeof(*p), GFP_KERNEL);
+ p = kzalloc((sizeof(*p) * 3), GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -5113,21 +5142,32 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)
}
for (i = 0; i < 3; i++) {
- ret = nand_read_data_op(chip, p, sizeof(*p), true);
+ ret = nand_read_data_op(chip, &p[i], sizeof(*p), true);
if (ret) {
ret = 0;
goto free_onfi_param_page;
}
- if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) ==
+ if (onfi_crc16(ONFI_CRC_BASE, (u8 *)&p[i], 254) ==
le16_to_cpu(p->crc)) {
+ if (i)
+ memcpy(p, &p[i], sizeof(*p));
break;
}
}
if (i == 3) {
- pr_err("Could not find valid ONFI parameter page; aborting\n");
- goto free_onfi_param_page;
+ const void *srcbufs[3] = {p, p + 1, p + 2};
+
+ pr_warn("Could not find a valid ONFI parameter page, trying bit-wise majority to recover it\n");
+ nand_bit_wise_majority(srcbufs, ARRAY_SIZE(srcbufs), p,
+ sizeof(*p));
+
+ if (onfi_crc16(ONFI_CRC_BASE, (u8 *)p, 254) !=
+ le16_to_cpu(p->crc)) {
+ pr_err("ONFI parameter recovery failed, aborting\n");
+ goto free_onfi_param_page;
+ }
}
/* Check version */
This version is still hard coded for a three sample bitwise majority vote.
So why not use the method which I suggested previously for v2 and which
I repeat below?
Because I want the nand_bit_wise_majority() function to work with
nsrcbufs > 3 (the ONFI spec says there's at least 3 copy of the param
page, but NAND vendor can decide to put more). Also, if the X copies of
the PARAM are corrupted (which is rather unlikely), that means we
already spent quite a lot of time reading the different copies and
calculating the CRC, so I think we don't care about perf optimizations
when doing bit-wise majority.
The three sample bitwise majority can be implemented without bit level
manipulation using the identity:
majority3(a, b, c) = (a & b) | (a & c) | (b & c)
This can be factorized slightly to (a & (b | c)) | (b & c)
This enables the operation to be performed 8, 16, 32 or even 64 bits at
a time depending on the hardware.

This method is not only faster and but also more compact.
I do understand that the ONFI specifications permit more than 3 copies.
However elsewhere the proposed code shows no intention of handling other
cases.
The constant 3 is hard coded in the following lines extracted from the
proposed code:
...
+ÂÂÂ p = kzalloc((sizeof(*p) * 3), GFP_KERNEL);
...
ÂÂÂÂ for (i = 0; i < 3; i++) {
...
ÂÂÂÂ if (i == 3) {
...
+ÂÂÂ ÂÂÂ const void *srcbufs[3] = {p, p + 1, p + 2};

Moreover the last of these is difficult to generalize.
Not that much. We just have to allocate srcbufs dynamically and
krealloc() everytime we want to add a new entry.

May I ask why you care that much about this optimization? As I said, if
we really have to read all the copies to realize none of them is good,
we already lost a lot of time, so having a "suboptimal but generic"
version of the bit-wise majority shouldn't hurt that much.


I just wanted to point out that, as the proposed code only handles the case of 3 copies, there is a much more efficient way of performing the majority vote than looping over each bit.
My method would also reduce the code size which is an important factor in the kernel.
Otherwise I have no objection and I shall accept your choice without further remarks on this point.

Cheers,
Chris