Re: [PATCH v4 1/5] mtd: nand: Create a BBT flag to access bad block markers in raw mode

From: Brian Norris
Date: Sun Oct 11 2015 - 16:03:26 EST


Hi Boris,

On Fri, Oct 02, 2015 at 08:27:38AM +0200, Boris Brezillon wrote:
> Brian, Archit,
>
> On Thu, 1 Oct 2015 19:44:34 -0700
> Brian Norris <computersforpeace@xxxxxxxxx> wrote:
>
> > On Wed, Aug 19, 2015 at 10:19:02AM +0530, Archit Taneja wrote:
> > > Some controllers can access the factory bad block marker from OOB only
> > > when they read it in raw mode. When ECC is enabled, these controllers
> > > discard reading/writing bad block markers, preventing access to them
> > > altogether.
> > >
> > > The bbt driver assumes MTD_OPS_PLACE_OOB when scanning for bad blocks.
> > > This results in the nand driver's ecc->read_oob() op to be called, which
> > > works with ECC enabled.
> > >
> > > Create a new BBT option flag that tells nand_bbt to force the mode to
> > > MTD_OPS_RAW. This would result in the correct op being called for the
> > > underlying nand controller driver.
>
> Actually I have the same kind of patch in my local tree (for a
> different reason though: the HW randomizer can mess up with the BBM
> byte if it's not disabled, and the only way to disable it in my current
> implementation is to switch to raw mode).
>
> >
> > MTD_OPS_RAW is probably the best way to do this, and we should switch
> > back to it for all users (rather than a new flag).
>
> I'm fine with this solution, but will that be acceptable for everybody?
> I mean, some NAND controllers are able to protect some OOB bytes, and
> the BBM might fall in those OOB bytes. In this case, shouldn't we rely
> on the ECC protection instead of reading the OOB in raw mode?

I think ECC is kind of misused a bit here. It's not really meant for
protecting BBMs, and it's also really not sufficient, esp. given
bitflips in erased areas.

> > But to do this, we
> > need to fix up some things. Particularly, we need to extend
> > 'badblockbits' support so that it is applied consistently in all places
> > (I recall there is one code path in which bad block scanning does take
> > this into account, and one that doesn't.)
>
> Yes, IIRC Andrea has posted a patch addressing that problem [1].
> Another problem I see is that badblockbits is currently assigned a
> fixed value by the NAND controller driver (or a default value of 8).
> There's no specific logic to correlate it to the required ECC strength.
> IMO, we should not let each NAND controller driver decide what is the
> appropriate value for each chip but rather implement the logic in
> nand_base.c based on ecc->strength and ecc->size, and IIRC this was
> the question Andrea asked when he posted his proposal.
>
> >
> > About badblockbits: it allows us to do a relaxed heuristic on matching
> > bad block markers, where we say the BBM is "bad" if more than fewer than
> > N bits are '1'. Right now, we just say that if there are any 0 bits in
> > the Bad Block Marker (BBM) region, then the block is bad. But this is
> > problematic for pages that have been worn down and might have bitflips.
> > So right now, part of a (bad) solution is to read with ECC, so worn
> > blocks that have data won't be later interpreted as bad blocks if we
> > rescan the BBMs (ECC will correct the bitflips, if the OOB is
> > protected).
> >
> > But that solution is not really good, since ECC is not really a panacea
> > for misinterpreted BBMs. And HW like yours apparently won't work like
> > this.
>
> Okay, I see you gave pretty much the same explanation, which makes mine
> useless :-).
>
> >
> > So in summary: if we can consistently make BBM checks look for 6 or 7
> > "one" bits (rather than a full 8 bits, i.e. BBM == 0xff), then we can
> > just unconditionally switch to RAW rather than PLACE_OOB. And we don't
> > need a flag like this pach introduces.
>
> I guess it all depends whether we want to let NAND controllers that can
> protect their BBM keep doing it (which IMO is not such a bad idea).

I think I was the only one consciously trying to do this. (Though I
guess it's possible some people discreetly hacked it in by not
supporting raw mode properly.) And for my cases, I'm pretty sure a
properly-improved raw mode BBM scan would be just as good, or actually
better. So I'm not sure anyone would really notice if we switched back
and properly accounted for flips.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/