RE: [PATCH v2 00/17] Add a bakvol module in UBI layer for MLC paired page power loss issue

From: Bean Huo 霍斌斌 (beanhuo)
Date: Wed Feb 03 2016 - 01:13:48 EST


Dear Richard
Thanks for reviewing my patches and valuable feedback.
I also want to work with you and Boris on such solution,
I know this is a complicated task, need our joint effort.

Following is my explanation for each of concerns:

> Bean,
>
> Am 02.02.2016 um 03:30 schrieb Bean Huo:
> > This version patches based on Linux kernel 4.2-rc7.
> >
> > v2:
> > 1.Add CRC32 protection for user OOB area data.
> > 2.Move recovery action from attach step to ubifs mount step.
> > 3.Add more comments for some key function.
> > 4.standard multi-plane program function.
> > 5.Standard send-mail patches
>
> It is sad to see that you've ignored almost all review comments that have
> been made.
> Both comments made by Boris and me on the public mailing list as well as the
> comments and I made on our phone call.
>
> To summarize, I see the following major issues with this patch series:
> 1. It makes use of OOB, it was very clearly stated that this violates one of
> UBI's design principles.
Currently I still cannot find a good method on how to manage backup info.
So now I still use OOB.
Backup page OOB stores source page address, and source page OOB store backup page address,
They store each other's page address, one is in order to indicate that backup page
is really one copy data of source page, in case of source block being erased and re-mapped again.
Another is in order to quickly find corresponding source page.

> 2. It is not generic, you use hard coded constants for Micron 70s/80s MLC
> NANDs.
> And AFAICT it is not guaranteed that all MLC NANDs support multi pane
> writes.
Yes, so far ,different NAND vendor with different paired page sequence,
So if guarantee all vendor NAND, should involve every vendor NAND paired page
Sequence table or calculator function, but now I don't have other NAND vendor
datasheet.

> 3. Not all data is protected. Most of UBI's meta data (EC&VID headers,
> volume table)
> are explicitly not protected, as well as UBIFS's superblock and master
> nodes.
For master node:
current UBIFS master node already has one copy for power loss,
Just not cover MLC paired page power loss issue, if my previous Master node issue
patch be accepted, master node totally does not need to protection.

EC&VID:
For EC header, according to my testing result data and our NAND PE cycle requirement,
If EC header be damaged by power loss , mean erase count is ok, no impact on NAND life.
But for VID header, because my solution uses dual plane page program, it has special page address
Requirement on dual plane page address, that is page number should be the same.
UBI already pre-program page 0 and page 1, so no page 1 is for VID.


> 4. It mixes UBI and UBIFS code. Having code like this in UBI's EBA code is not
> acceptable:
> ubi_io_write_data(ubi, buf, pnum, offset, len, ((lnum < 3) ? 0 : 1));
> This hack is here to void safe writes of UBIFS's suberblock and master
> nodes.
> But UBIFS specific code has no business in UBI core code. Also See 3.
These codes need to modify. Thanks.

> 5. It does not scale. Having a backup of every important page would require
> much more space.
> Especially UBI EC and VID headers. I bet this is also the reason why the
> current implementation
> does not protect them. Otherwise the bakvol would fill up immediately.
> Another scalability problem is that the whole bakvol has to be scanned
> which will slow down
> the attach/mount process and may void the speedup gained by Fastmap.
This is also my concern, maybe next version I can fix this by store bakvol PEB info
Into flash.

> 6. It focuses only on UBIFS. ubi_corrupted_data_recovery() is only called
> from UBIFS code.
> If UBI is changed, we need a solution for all users on top of UBI.
> Now I understand also why you want the "fs:ubifs:recovery:fixup UBIFS
> cannot recover master node issue"
> patch. As UBIFS master nodes are not protected by bakvol they will
> corrupt and UBIFS is not able to recover
> nor can run ubi_corrupted_data_recovery().
>
> 7. A full bakvol is not proper handled. The implementation falls back to
> "unsafe" writes.
> dbg_gen("Allocate new PEB for Bakvol.\n");
> pbk = allo_new_block_for_bakvol(ubi, oppe_plane);
> if (!pbk) {
> ubi_err(ubi, "Allocate new PEB failed.\n");
> nobak = 1;
> goto Only_source;
> }
> You cannot expect users to query the kernel log for such error messages.
>
>
> Beside of these issues there are a lot of small implementation issues.
> For example Fastmap is broken as UBI_FM_SB_VOLUME_ID clashes with
> UBI_BACKUP_VOLUME_ID.
> Or struct list_head is part of on-Flash data structures.
>
> That said, it don't think the bakvol approach is the solution we're looking for.
> A proper solution has to fulfill the following conditions:
> 1. No dependency on OOB data.
> 2. Can work with any MLC NANDs and controllers. Boris and I have access to
> Hynix, Micron and Toshiba NAND so far.
> 3. If implemented on UBI level, it has to work for any user on top of UBI.
> 4. Has to be able to protect all data.
> 5. Must not regress current UBI or UBIFS.
>
> Boris and I are working on such a solution and would like to invite you to join
> us.
> The topic is complicated and not easy to solve. But with joined forces we will
> be able to find a good solution. :-)
Look forward to work with you for this.

> Thanks,
> //richard