# Re: Re: [PATCH] mtd/nand/nand_ecc.c: replace bitsperbyte table by asimple operation

**From: **Zhaoxiu Zeng

**Date: ** Mon Mar 26 2012 - 09:10:31 EST

Dear Paul and Frans

Thanks for your reply.

I don't write a commit log because I have ugly eglish! :)

For all words which has only one "1" bit, "!(tmp & (tmp - 1))"

must be true, others except zero must be false!

In this instance we just need to check whether the result has

only one "1" bit, so "!(tmp & (tmp - 1))" is enough, and faster.

Rest changes are adjuvant.

>* Dear Zhaoxiu*

>* *

>* Thanks for contributing this patch.*

>* However, I think it has some issues and should not be applied. See my*

>* remarks below*

>* *

>* 2012/3/24 Zhaoxiu Zeng <zengzhaoxiu <at> 163.com>:*

>* >*

>* > Signed-off-by: Zhaoxiu Zeng <zengzhaoxiu <at> 163.com>*

>* >*

>* > ---*

>* > drivers/mtd/nand/nand_ecc.c | 50 +++++++++---------------------------------*

>* > 1 files changed, 11 insertions(+), 39 deletions(-)*

>* >*

>* > diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c*

>* > index b7cfe0d..e05a0fd 100644*

>* > --- a/drivers/mtd/nand/nand_ecc.c*

>* > +++ b/drivers/mtd/nand/nand_ecc.c*

>* > @@ -85,30 +85,6 @@ static const char invparity[256] = {*

>* > };*

>* >*

>* > /**

>* > - * bitsperbyte contains the number of bits per byte*

>* > - * this is only used for testing and repairing parity*

>* > - * (a precalculated value slightly improves performance)*

>* > - */*

>* > -static const char bitsperbyte[256] = {*

>* > - 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,*

>* > - 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,*

>* > - 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,*

>* > - 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,*

>* > - 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,*

>* > - 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,*

>* > - 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,*

>* > - 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,*

>* > - 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,*

>* > - 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,*

>* > - 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,*

>* > - 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,*

>* > - 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,*

>* > - 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,*

>* > - 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,*

>* > - 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,*

>* > -};*

>* > -*

>* > -/**

>* > * addressbits is a lookup table to filter out the bits from the xor-ed*

>* > * ECC data that identify the faulty location.*

>* > * this is only used for repairing parity*

>* > @@ -448,12 +424,14 @@ int __nand_correct_data(unsigned char *buf,*

>* > unsigned int byte_addr;*

>* > /* 256 or 512 bytes/ecc */*

>* > const uint32_t eccsize_mult = eccsize >> 8;*

>* > + const uint32_t check_mask = eccsize_mult == 2 ? 0x555555 : 0x545555;*

>* > + uint32_t tmp;*

>* >*

>* > /**

>* > * b0 to b2 indicate which bit is faulty (if any)*

>* > * we might need the xor result more than once,*

>* > * so keep them in a local var*

>* > - */*

>* > + */*

>* > #ifdef CONFIG_MTD_NAND_ECC_SMC*

>* > b0 = read_ecc[0] ^ calc_ecc[0];*

>* > b1 = read_ecc[1] ^ calc_ecc[1];*

>* > @@ -467,14 +445,11 @@ int __nand_correct_data(unsigned char *buf,*

>* >*

>* > /* repeated if statements are slightly more efficient than switch ... */*

>* > /* ordered in order of likelihood */*

>* > -*

>* > - if ((b0 | b1 | b2) == 0)*

>* > + tmp = ((uint32_t)b2 << 16) | ((uint32_t)b1 << 8) | b0;*

>* *

>* Not sure how efficient this is; on some systems shifting by N takes N*

>* clock cycles so this could be relatively expensive.*

>* (and the systems where it takes more clock cycles have a higher chance*

>* on not having hw crc correction).*

>* *

>* > + if (tmp == 0)*

>* > return 0; /* no error */*

>* >*

>* > - if ((((b0 ^ (b0 >> 1)) & 0x55) == 0x55) &&*

>* > - (((b1 ^ (b1 >> 1)) & 0x55) == 0x55) &&*

>* > - ((eccsize_mult == 1 && ((b2 ^ (b2 >> 1)) & 0x54) == 0x54) ||*

>* > - (eccsize_mult == 2 && ((b2 ^ (b2 >> 1)) & 0x55) == 0x55))) {*

>* > + if (((tmp ^ (tmp >> 1)) & check_mask) == check_mask) {*

>* *

>* Have you benchmarked this?*

>* *

>* And while it might be an optimisation, it is not mentioned in the*

>* commit message.*

>* *

>* > /* single bit error */*

>* > /**

>* > * rp17/rp15/13/11/9/7/5/3/1 indicate which byte is the faulty*

>* > @@ -492,19 +467,16 @@ int __nand_correct_data(unsigned char *buf,*

>* > * We could also do addressbits[b2] >> 1 but for the*

>* > * performance it does not make any difference*

>* > */*

>* > - if (eccsize_mult == 1)*

>* > - byte_addr = (addressbits[b1] << 4) + addressbits[b0];*

>* > - else*

>* > - byte_addr = (addressbits[b2 & 0x3] << 8) +*

>* > - (addressbits[b1] << 4) + addressbits[b0];*

>* > + byte_addr = (addressbits[b1] << 4) + addressbits[b0];*

>* > + if (eccsize_mult == 2)*

>* > + byte_addr += (addressbits[b2 & 0x3] << 8);*

>* *

>* I'm not sure if this is a win. It *looks* like an optimisation since*

>* you factor out some common code. Then again within the if you need to*

>* add to byte_addr.*

>* For a naive compiler this is probably more expensive. For a good*

>* compiler this probably makes no difference at all.*

>* *

>* > bit_addr = addressbits[b2 >> 2];*

>* > /* flip the bit */*

>* > buf[byte_addr] ^= (1 << bit_addr);*

>* > return 1;*

>* > -*

>* > }*

>* > - /* count nr of bits; use table lookup, faster than calculating it */*

>* > - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1)*

>* > +*

>* > + if (!(tmp & (tmp - 1)))*

>* *

>* This is not the same as the code you replace!!!*

>* *

>* take as example b0 = 1; b1 =0, b2 = 0;*

>* The original if statement will return true:*

>* bitsperbyte[b0] in that case is 1*

>* bitsperbyte[b1] = 0*

>* bitsperbyte[b2] = 0*

>* so (bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) equals 1 and*

>* ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) yields 1 so true*

>* *

>* However if I take your code*

>* tmp = ((uint32_t)b2 << 16) | ((uint32_t)b1 << 8) | b0;*

>* taking the example values from above this results in tmp being 1;*

>* tmp & (tmp - 1) becomes 1 & 0 which effectively results in 0.*

>* *

>* So in my original code b0 = 1; b1 =0, b2 = 0; results in 1 being*

>* returned whereas your code will result in a log entry "incorrectable*

>* error" and a return of -1.*

>* *

You ignore the "!" opeator, Frans.

>* Given this, I'd say this patch is to be rejected.*

>* *

>* BTW: optimisations in this part of code are not too important. This is*

>* only called when ecc errors are to be corrected and that is not very*

>* likely.*

>* So perhaps it is not worth the time to improve it.*

>* (and yes, it might be possible to save some bytes by eliminating the*

>* array; then again the code and logic is not really trivial so good*

>* testing is needed)*

>* *

>* See also Documentation/mtd/nand_ecc.txt*

>* near the end there is a small section on correcting.*

>* *

>* > return 1; /* error in ECC data; no action needed */*

>* >*

>* > printk(KERN_ERR "uncorrectable error : ");*

>* > --*

>* > 1.7.7.6*

>* >*

>* >*

>* *

>* Best regards, Frans.*

--

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

the body of a message to majordomo@xxxxxxxxxxxxxxx

More majordomo info at http://vger.kernel.org/majordomo-info.html

Please read the FAQ at http://www.tux.org/lkml/