Re: [PATCH] mtd: brcmnand: Workaround false ECC uncorrectable errors
From: Jonas Gorski
Date: Wed Dec 02 2015 - 16:08:55 EST
Hi,
On Wed, Dec 2, 2015 at 9:54 PM, Brian Norris
<computersforpeace@xxxxxxxxx> wrote:
> Hi,
>
> On Wed, Dec 02, 2015 at 09:44:04PM +0100, Jonas Gorski wrote:
>> On Wed, Dec 2, 2015 at 9:17 PM, Simon Arlott <simon@xxxxxxxxxxx> wrote:
>> > On 01/12/15 10:41, Jonas Gorski wrote:
>> >> On Sat, Nov 28, 2015 at 8:23 PM, Simon Arlott <simon@xxxxxxxxxxx> wrote:
>> >>> +
>> >>> + /* Go to start of buffer */
>> >>> + buf -= FC_WORDS;
>> >>> +
>> >>> + /* Erased if all data bytes are 0xFF */
>> >>> + buf_erased = memchr_inv(buf, 0xFF, FC_WORDS) == NULL;
>> >>> +
>> >>> + if (!buf_erased)
>> >>> + goto out_free;
>> >>
>> >> We now have a function exactly for that use case in 4.4,
>> >> nand_check_erased_buf [1], consider using that. This also has the
>> >> benefit of treating bit flips as correctable as long as the ECC scheme
>> >> is strong enough.
>> >
>> > I have no idea whether or not it's appropriate to specify
>> > bitflips_threshold > 0 so it'd just be a more complex way to do
>> > a memchr_inv() search for 0xFF.
>>
>> The threshold would be the amount of bitflips the code can correct, so
>> basically ecc.strength (at least that is my understanding).
>>
>> > The code also has to check for the hamming code bytes being all 0x00,
>> > because according to the comments [2], the controller also has
>> > difficulty with the non-erased all-0xFFs scenario too.
>>
>> According to brcmnand.c hamming can fix up to fifteen bitflips, but in
>
> Hamming only protects 1 bitflip. The '15' is the value used by the
> controller to represent Hamming (i.e., there is no BCH-15).
Ah, yeah that confused me because I also vaguely remembered hamming
only providing protection for 1, but then saw the ecc_level = 15
assignment.
Still, that means that even hamming protected erased pages with a
single bitflip should be treated as readable / all-0xff, but with
correctable bitflips, and not as uncorrectable.
>> the current code you would fail a hamming protected all-0xff-page for
>> even a single bitflip in the data or in the ecc bytes, which means
>> that all-0xff-pages wouldn't be protected at all.
>
> BTW, I think Kamal had code to handle protecting bitflips in erased
> pages code in the Broadcom STB Linux BSP. Perhaps he can port that to
> upstream with nand_check_erased_ecc_chunk()? IIUC, that would probably
> handle your case too, Simon, although it wouldn't be optimal for an
> all-0xff check (i.e., bitflip_threshold == 0).
>
> If that's really an issue (i.e., we have an implementation + data), I'm
> sure we could add optimization to nand_check_erased_ecc_chunk() to
> support the bitflip_threshold == 0 case.
Maybe I'm missing something, but wasn't the point of introducing
nand_check_erased_ecc_chunk that bitflips in erased pages should be
treated as bitflips corrected by the ecc, and therefore fixed up
before passing the data further on? So having a theshold of 0 would be
wrong / no protection at all, and could be quite destructive on MLC
nand, where bitflips in erased pages are rather common.
Jonas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/