Re: [PATCH v7] mtd: rawnand: use bit-wise majority to recover the contents of ONFI parameter

From: Boris Brezillon
Date: Wed May 16 2018 - 03:24:08 EST


On Wed, 16 May 2018 09:32:57 +0200
Chris Moore <moore@xxxxxxx> wrote:

> Hi,
>
> Le 15/05/2018 Ã 09:34, Boris Brezillon a ÃcritÂ:
> > On Tue, 15 May 2018 06:45:51 +0200
> > Chris Moore <moore@xxxxxxx> wrote:
> >
> >> Hi,
> >>
> >> Le 13/05/2018 Ã 06:30, Wan, Jane (Nokia - US/Sunnyvale) a ÃcritÂ:
> >>> Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC values, the bit-wise majority may be used to recover the contents of the parameter pages from the parameter page copies present.
> >>>
> >>> Signed-off-by: Jane Wan <Jane.Wan@xxxxxxxxx>
> >>> ---
> >>> v7: change debug print messages
> >>> v6: support the cases that srcbufs are not contiguous
> >>> v5: make the bit-wise majority functon generic
> >>> v4: move the bit-wise majority code in a separate function
> >>> v3: fix warning message detected by kbuild test robot
> >>> v2: rebase the changes on top of v4.17-rc1
> >>>
> >>> drivers/mtd/nand/raw/nand_base.c | 50 ++++++++++++++++++++++++++++++++++----
> >>> 1 file changed, 45 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> >>> index 72f3a89..b43b784 100644
> >>> --- a/drivers/mtd/nand/raw/nand_base.c
> >>> +++ b/drivers/mtd/nand/raw/nand_base.c
> >>> @@ -5087,6 +5087,35 @@ static int nand_flash_detect_ext_param_page(struct nand_chip *chip,
> >>> }
> >>>
> >>> /*
> >>> + * Recover data with bit-wise majority
> >>> + */
> >>> +static void nand_bit_wise_majority(const void **srcbufs,
> >>> + unsigned int nsrcbufs,
> >>> + void *dstbuf,
> >>> + unsigned int bufsize)
> >>> +{
> >>> + int i, j, k;
> >>> +
> >>> + for (i = 0; i < bufsize; i++) {
> >>> + u8 cnt, val;
> >>> +
> >>> + val = 0;
> >>> + for (j = 0; j < 8; j++) {
> >>> + cnt = 0;
> >>> + for (k = 0; k < nsrcbufs; k++) {
> >>> + const u8 *srcbuf = srcbufs[k];
> >>> +
> >>> + if (srcbuf[i] & BIT(j))
> >>> + cnt++;
> >>> + }
> >>> + if (cnt > nsrcbufs / 2)
> >>> + val |= BIT(j);
> >>> + }
> >>> + ((u8 *)dstbuf)[i] = val;
> >>> + }
> >>> +}
> >>> +
> >>> +/*
> >>> * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 otherwise.
> >>> */
> >>> static int nand_flash_detect_onfi(struct nand_chip *chip)
> >>> @@ -5102,7 +5131,7 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)
> >>> return 0;
> >>>
> >>> /* ONFI chip: allocate a buffer to hold its parameter page */
> >>> - p = kzalloc(sizeof(*p), GFP_KERNEL);
> >>> + p = kzalloc((sizeof(*p) * 3), GFP_KERNEL);
> >>> if (!p)
> >>> return -ENOMEM;
> >>>
> >>> @@ -5113,21 +5142,32 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)
> >>> }
> >>>
> >>> for (i = 0; i < 3; i++) {
> >>> - ret = nand_read_data_op(chip, p, sizeof(*p), true);
> >>> + ret = nand_read_data_op(chip, &p[i], sizeof(*p), true);
> >>> if (ret) {
> >>> ret = 0;
> >>> goto free_onfi_param_page;
> >>> }
> >>>
> >>> - if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) ==
> >>> + if (onfi_crc16(ONFI_CRC_BASE, (u8 *)&p[i], 254) ==
> >>> le16_to_cpu(p->crc)) {
> >>> + if (i)
> >>> + memcpy(p, &p[i], sizeof(*p));
> >>> break;
> >>> }
> >>> }
> >>>
> >>> if (i == 3) {
> >>> - pr_err("Could not find valid ONFI parameter page; aborting\n");
> >>> - goto free_onfi_param_page;
> >>> + const void *srcbufs[3] = {p, p + 1, p + 2};
> >>> +
> >>> + pr_warn("Could not find a valid ONFI parameter page, trying bit-wise majority to recover it\n");
> >>> + nand_bit_wise_majority(srcbufs, ARRAY_SIZE(srcbufs), p,
> >>> + sizeof(*p));
> >>> +
> >>> + if (onfi_crc16(ONFI_CRC_BASE, (u8 *)p, 254) !=
> >>> + le16_to_cpu(p->crc)) {
> >>> + pr_err("ONFI parameter recovery failed, aborting\n");
> >>> + goto free_onfi_param_page;
> >>> + }
> >>> }
> >>>
> >>> /* Check version */
> >> This version is still hard coded for a three sample bitwise majority vote.
> >> So why not use the method which I suggested previously for v2 and which
> >> I repeat below?
> > Because I want the nand_bit_wise_majority() function to work with
> > nsrcbufs > 3 (the ONFI spec says there's at least 3 copy of the param
> > page, but NAND vendor can decide to put more). Also, if the X copies of
> > the PARAM are corrupted (which is rather unlikely), that means we
> > already spent quite a lot of time reading the different copies and
> > calculating the CRC, so I think we don't care about perf optimizations
> > when doing bit-wise majority.
> >
> >> The three sample bitwise majority can be implemented without bit level
> >> manipulation using the identity:
> >> majority3(a, b, c) = (a & b) | (a & c) | (b & c)
> >> This can be factorized slightly to (a & (b | c)) | (b & c)
> >> This enables the operation to be performed 8, 16, 32 or even 64 bits at
> >> a time depending on the hardware.
> >>
> >> This method is not only faster and but also more compact.
> >>
>
> I do understand that the ONFI specifications permit more than 3 copies.
> However elsewhere the proposed code shows no intention of handling other
> cases.
> The constant 3 is hard coded in the following lines extracted from the
> proposed code:
> ...
> +ÂÂÂ p = kzalloc((sizeof(*p) * 3), GFP_KERNEL);
> ...
> ÂÂÂÂ for (i = 0; i < 3; i++) {
> ...
> ÂÂÂÂ if (i == 3) {
> ...
> +ÂÂÂ ÂÂÂ const void *srcbufs[3] = {p, p + 1, p + 2};
>
> Moreover the last of these is difficult to generalize.

Not that much. We just have to allocate srcbufs dynamically and
krealloc() everytime we want to add a new entry.

May I ask why you care that much about this optimization? As I said, if
we really have to read all the copies to realize none of them is good,
we already lost a lot of time, so having a "suboptimal but generic"
version of the bit-wise majority shouldn't hurt that much.