Re: [PATCH v2 1/2] mtd: nand: Add Cadence NAND controller driver

From: Piotr Sroka
Date: Thu Jun 06 2019 - 12:09:45 EST


Hi Miquel


The 05/12/2019 14:24, Miquel Raynal wrote:
EXTERNAL MAIL


EXTERNAL MAIL


Hi Piotr,

Sorry for de delay.

Piotr Sroka <piotrs@xxxxxxxxxxx> wrote on Thu, 21 Mar 2019 09:33:58
+0000:

The 03/05/2019 19:09, Miquel Raynal wrote:
>EXTERNAL MAIL
>
>
>Hi Piotr,
>
>Piotr Sroka <piotrs@xxxxxxxxxxx> wrote on Tue, 19 Feb 2019 16:18:23
>+0000:
>
>> This patch adds driver for Cadence HPNFC NAND controller.
>>
>> Signed-off-by: Piotr Sroka <piotrs@xxxxxxxxxxx>
>> ---
>> Changes for v2:
>> - create one universal wait function for all events instead of one
>> function per event.
>> - split one big function executing nand operations to separate
>> functions one per each type of operation.
>> - add erase atomic operation to nand operation parser
>> - remove unnecessary includes.
>> - remove unused register defines
>> - add support for multiple nand chips
>> - remove all code using legacy functions
>> - remove chip dependents parameters from dts bindings, they were
>> attached to the SoC specific compatible at the driver level
>> - simplify interrupt handling
>> - simplify timing calculations
>> - fix calculation of maximum supported cs signals
>> - simplify ecc size calculation
>> - remove header file and put whole code to one c file
>> ---
>> drivers/mtd/nand/raw/Kconfig | 8 +
>> drivers/mtd/nand/raw/Makefile | 1 +
>> drivers/mtd/nand/raw/cadence-nand-controller.c | 3288 ++++++++++++++++++++++++
>
>This driver is way too massive, I am pretty sure it can shrink a
>little bit more.
>[...]
>
I will try to make it shorer but it will be difucult to achive. It is because - there are a lot of calculation needed for PHY - ECC are interleaved with data (like on marvell-nand or gpmi-nand).
Therefore: + RAW mode is complicated + protecting BBM increases number of lines of source code
- need to support two DMA engines internal and external (slave) We will see on next patch version what is the result. That page layout looks:

Maybe you don't need to support both internal and external DMA?

I am pretty sure there are rooms for size reduction.

I describe how it works in general and maybe you help me chose better solution.

HW controller can work in 3 modes. PIO - can work in master or slave DMA
CDMA - needs Master DMA for accessing command descriptors.
Generic mode - can use only Slave DMA.

Generic mode is neccessery to implement functions other than page
program, page read, block erase. So it is essential. I cannot avoid
to use Slave DMA.

I change CDMA mode to PIO mode. Then I can use only slave DMA. But CDMA has a feature which is not present in PIO mode. The feature
gives possibility to point DMA engine two buffers to transfer. It is
used to point data buffer and oob bufer. In PIO mode I would need to
copy data buffer and oob buffer to third buffer. Next transfer data from
third buffer.
In that solution we need to copy all data by CPU and then use DMA. Controller needs always transfer oob because of HW ECC restrictions. Such change will decrease performce for all data transfers.
I think performance is more important in that case. What is your
opinion?
[...]
>
>What is this for?
Fucntions enables/disables hardware detection of erased data
pages. >

Ok, the name is not very explicit , maybe you could tell this with a
comment.

Ok.

>> +
>> +/* hardware initialization */
>> +static int cadence_nand_hw_init(struct cdns_nand_ctrl *cdns_ctrl)
>> +{
>> + int status = 0;
>> + u32 reg;
>> +
>> + status = cadence_nand_wait_for_value(cdns_ctrl, CTRL_STATUS,
>> + 1000000,
>> + CTRL_STATUS_INIT_COMP, false);
>> + if (status)
>> + return status;
>> +
>> + reg = readl(cdns_ctrl->reg + CTRL_VERSION);
>> +
>> + dev_info(cdns_ctrl->dev,
>> + "%s: cadence nand controller version reg %x\n",
>> + __func__, reg);
>> +
>> + /* disable cache and multiplane */
>> + writel(0, cdns_ctrl->reg + MULTIPLANE_CFG);
>> + writel(0, cdns_ctrl->reg + CACHE_CFG);
>> +
>> + /* clear all interrupts */
>> + writel(0xFFFFFFFF, cdns_ctrl->reg + INTR_STATUS);
>> +
>> + cadence_nand_get_caps(cdns_ctrl);
>> + cadence_nand_read_bch_cfg(cdns_ctrl);
>
>No, you cannot rely on the bootloader's configuration. And I suppose
>this is what the first call to read_bch_cfg does?
I do not realy on boot loader. Just read NAND flash
controller configuration from read only capabilities registers.

Ok, if these are RO registers, it's fine. But maybe don't call the
function "read bch config" which suggest that this is something you can
change.

ok.



>> +
>> +#define TT_OOB_AREA 1
>> +#define TT_MAIN_OOB_AREAS 2
>> +#define TT_RAW_PAGE 3
>> +#define TT_BBM 4
>> +#define TT_MAIN_OOB_AREA_EXT 5
>> +
>> +/* prepare size of data to transfer */
>> +static int
>> +cadence_nand_prepare_data_size(struct nand_chip *chip,
>> + int transfer_type)
>> +{
>> + struct cdns_nand_ctrl *cdns_ctrl = to_cdns_nand_ctrl(chip->controller);
>> + struct cdns_nand_chip *cdns_chip = to_cdns_nand_chip(chip);
>> + u32 sec_size = 0, last_sec_size, offset = 0, sec_cnt = 1;
>> + u32 ecc_size = chip->ecc.bytes;
>> + u32 data_ctrl_size = 0;
>> + u32 reg = 0;
>> +
>> + if (cdns_ctrl->curr_trans_type == transfer_type)
>> + return 0;
>> +
>> + switch (transfer_type) {
>
>Please turn the controller driver as dumb as possible. You should not
>care which part of the OOB area you are accessing.
It is a bit confusing for me how accessing OOB should be implemented.
I know that read_oob function is called to check BBM value when BBT is
initialized. It is also a bit confusing for me why the raw version is
not used for that purpose. In current implementation if you write oob by write_page function next
read oob by read_oob function then data will be the same.
If I implement dump functions read_oob and write_oob then
1. ECC must be disabled for these functions
2. oob data accessing by write_page/read_page will be different
(different offsets) that the data accessing by read_oob/write_oob
functions

No, I fear this is not acceptable.

If above described "functionalities" are acceptable I will change implementation of write_oob and read_oob functions.
The write_page and read_page must be implemented in that way as it is now. Let me know which solution is preffered.

If this is too complicated to just write the oob, why not fallback on
read/write_page (with oob_required and a dummy data buffer)?

I considered it. Actually, it would simplify the code. The disadvantage
of using the same function is that the each write/read oob will cause full page
read/write. In current version only last sector is read/write together
with oob. This will affect the performance degradation of oob write/read function. So I do not know what is more important. 1. OOB functions performance,
2. simplier code.

>> + case TT_OOB_AREA:
>> + offset = cdns_chip->main_size - cdns_chip->sector_size;
>> + ecc_size = ecc_size * (offset / cdns_chip->sector_size);
>> + offset = offset + ecc_size;
>> + sec_cnt = 1;
>> + last_sec_size = cdns_chip->sector_size
>> + + cdns_chip->avail_oob_size;
>> + break;
>> + case TT_MAIN_OOB_AREA_EXT:
>> + sec_cnt = cdns_chip->sector_count;
>> + last_sec_size = cdns_chip->sector_size;
>> + sec_size = cdns_chip->sector_size;
>> + data_ctrl_size = cdns_chip->avail_oob_size;
>> + break;
>> + case TT_MAIN_OOB_AREAS:
>> + sec_cnt = cdns_chip->sector_count;
>> + last_sec_size = cdns_chip->sector_size
>> + + cdns_chip->avail_oob_size;
>> + sec_size = cdns_chip->sector_size;
>> + break;
>> + case TT_RAW_PAGE:
>> + last_sec_size = cdns_chip->main_size + cdns_chip->oob_size;
>> + break;
>> + case TT_BBM:
>> + offset = cdns_chip->main_size + cdns_chip->bbm_offs;
>> + last_sec_size = 8;
>> + break;
>> + default:
>> + dev_err(cdns_ctrl->dev, "Data size preparation failed\n");
>> + return -EINVAL;
>> + }
>> +
>> + reg = 0;
>> + reg |= FIELD_PREP(TRAN_CFG_0_OFFSET, offset);
>> + reg |= FIELD_PREP(TRAN_CFG_0_SEC_CNT, sec_cnt);
>> + writel(reg, cdns_ctrl->reg + TRAN_CFG_0);
>> +
>> + reg = 0;
>> + reg |= FIELD_PREP(TRAN_CFG_1_LAST_SEC_SIZE, last_sec_size);
>> + reg |= FIELD_PREP(TRAN_CFG_1_SECTOR_SIZE, sec_size);
>> + writel(reg, cdns_ctrl->reg + TRAN_CFG_1);
>> +
>> + reg = readl(cdns_ctrl->reg + CONTROL_DATA_CTRL);
>> + reg &= ~CONTROL_DATA_CTRL_SIZE;
>> + reg |= FIELD_PREP(CONTROL_DATA_CTRL_SIZE, data_ctrl_size);
>> + writel(reg, cdns_ctrl->reg + CONTROL_DATA_CTRL);
>> +
>> + cdns_ctrl->curr_trans_type = transfer_type;
>> +
>> + return 0;
>> +}
>> +
[...]
>> +
[...] >> + /*
>> + * the idea of those calculation is to get the optimum value
>> + * for tRP and tRH timings if it is NOT possible to sample data
>> + * with optimal tRP/tRH settings the parameters will be extended
>> + */
>> + if (sdr->tRC_min <= clk_period &&
>> + sdr->tRP_min <= (clk_period / 2) &&
>> + sdr->tREH_min <= (clk_period / 2)) {
>
>Will this situation really happen?
I think yes for follwing values trc_min 20000 ps
trp_min 10000 ps
treh_min 7000 ps
clk_period 20000 ps

Ok, you may add a comment stating that this may be the case in EDO mode
5.
I did not anwer clearly last time. It was just an example. The result of that "if" depends on NAND flash device timing mode and NAND flash controller clock. Minumum value of clk is 20MHz (50ns). So it may be a case for Asynchronous Mode 1 if
NAND flash controller clock is 20MHz. I will add this info in comment.
[...]
>> + }
>> +
>> + if (cdns_ctrl->caps2.is_phy_type_dll) {
>
>Is the else part allowed?
Register accessed in this block does not exists if is_phy_type_dll is 0. So they are preveted to be accessed. the else is not needed.
>
following register does not exist if caps2.is_phy_type_dll is 0 >> + u32 tpre_cnt = calc_cycl(tpre, clk_period);
>> + u32 tcdqss_cnt = calc_cycl(tcdqss + if_skew, clk_period);
>> + u32 tpsth_cnt = calc_cycl(tpsth + if_skew, clk_period);
>> +
>> + u32 trpst_cnt = calc_cycl(trpst + if_skew, clk_period) + 1;
>> + u32 twpst_cnt = calc_cycl(twpst + if_skew, clk_period) + 1;
>> + u32 tcres_cnt = calc_cycl(tcres + if_skew, clk_period) + 1;
>> + u32 tcdqsh_cnt = calc_cycl(tcdqsh + if_skew, clk_period) + 5;
>> +
>> + tcr_cnt = calc_cycl(tcr + if_skew, clk_period);
>> + /*
>> + * skew not included because this timing defines duration of
>> + * RE or DQS before data transfer
>> + */
>> + tpsth_cnt = tpsth_cnt + 1;
>> + reg = FIELD_PREP(TOGGLE_TIMINGS0_TPSTH, tpsth_cnt);
>> + reg |= FIELD_PREP(TOGGLE_TIMINGS0_TCDQSS, tcdqss_cnt);
>> + reg |= FIELD_PREP(TOGGLE_TIMINGS0_TPRE, tpre_cnt);
>> + reg |= FIELD_PREP(TOGGLE_TIMINGS0_TCR, tcr_cnt);
>> + t->toggle_timings_0 = reg;
>> + dev_dbg(cdns_ctrl->dev, "TOGGLE_TIMINGS_0_SDR\t%x\n", reg);
>> +
>> + //toggle_timings_1 - tRPST,tWPST
>> + reg = FIELD_PREP(TOGGLE_TIMINGS1_TCDQSH, tcdqsh_cnt);
>> + reg |= FIELD_PREP(TOGGLE_TIMINGS1_TCRES, tcres_cnt);
>> + reg |= FIELD_PREP(TOGGLE_TIMINGS1_TRPST, trpst_cnt);
>> + reg |= FIELD_PREP(TOGGLE_TIMINGS1_TWPST, twpst_cnt);
>> + t->toggle_timings_1 = reg;
>> + dev_dbg(cdns_ctrl->dev, "TOGGLE_TIMINGS_1_SDR\t%x\n", reg);
>> + }
[...] >
>This function is so complicated !!! How can this even work? Really, it
>is hard to get into the code and follow, I am sure you can do
>something.
Yes it is complicated but works, I will try to simplify it... [...]

Yes please!

>> + "CS %d already assigned\n", cs);
>> + return -EINVAL;
>> + }
>> +
>> + cdns_chip->cs[i] = cs;
>> + }
>> +
>> + chip = &cdns_chip->chip;
>> + chip->controller = &cdns_ctrl->controller;
>> + nand_set_flash_node(chip, np);
>> +
>> + mtd = nand_to_mtd(chip);
>> + mtd->dev.parent = cdns_ctrl->dev;
>> +
>> + /*
>> + * Default to HW ECC engine mode. If the nand-ecc-mode property is given
>> + * in the DT node, this entry will be overwritten in nand_scan_ident().
>> + */
>> + chip->ecc.mode = NAND_ECC_HW;
>> +
>> + /*
>> + * Save a reference value for timing registers before
>> + * ->setup_data_interface() is called.
>> + */
>> + cadence_nand_get_timings(cdns_ctrl, &cdns_chip->timings);
>
>You cannot rely on the Bootloader's configuration. This driver should
>derive it.
I do not relay on the Bootloader's configuration in any part. I just
init timings structure base on current values of registers to do not
have rubish in timing structure. Values will be calculated by driver when
setup_data_interface is called. In case set_timings is called before
setup_data_interface

Does this really happens? I am pretty sure it is taken care of by the
core. I don't think you should rely on what's in the registers at boot
time.
Ok I will check it one more time and remove if not needed.



then we write the same valus to timing registers
which are preset in registres. To be shorter timing registers will stay
unchanged. >> + ret = nand_scan(chip, cdns_chip->nsels);
>> + if (ret) {
>> + dev_err(cdns_ctrl->dev, "could not scan the nand chip\n");
>> + return ret;
>> + }
>> +
>> + ret = mtd_device_register(mtd, NULL, 0);
>> + if (ret) {
>> + dev_err(cdns_ctrl->dev,
>> + "failed to register mtd device: %d\n", ret);
>> + nand_release(chip);
>
>I think you should call nand_cleanup instead of nand_release here has
>the mtd device is not registered yet.
ok

>> + return ret;
>> + }
>> +
>> + list_add_tail(&cdns_chip->node, &cdns_ctrl->chips);
>> +
>> + return 0;
>> +}
>> +
>> +static int cadence_nand_chips_init(struct cdns_nand_ctrl *cdns_ctrl)
>> +{
>> + struct device_node *np = cdns_ctrl->dev->of_node;
>> + struct device_node *nand_np;
>> + int max_cs = cdns_ctrl->caps2.max_banks;
>> + int nchips;
>> + int ret;
>> +
>> + nchips = of_get_child_count(np);
>> +
>> + if (nchips > max_cs) {
>> + dev_err(cdns_ctrl->dev,
>> + "too many NAND chips: %d (max = %d CS)\n",
>> + nchips, max_cs);
>> + return -EINVAL;
>> + }
>> +
>> + for_each_child_of_node(np, nand_np) {
>> + ret = cadence_nand_chip_init(cdns_ctrl, nand_np);
>> + if (ret) {
>> + of_node_put(nand_np);
>> + return ret;
>> + }
>
>If nand_chip_init() fails on another chip than the first one, there is
>some garbage collection to do.
ok

>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int cadence_nand_init(struct cdns_nand_ctrl *cdns_ctrl)
>> +{
>> + dma_cap_mask_t mask;
>> + int ret = 0;
>> +
>> + cdns_ctrl->cdma_desc = dma_alloc_coherent(cdns_ctrl->dev,
>> + sizeof(*cdns_ctrl->cdma_desc),
>> + &cdns_ctrl->dma_cdma_desc,
>> + GFP_KERNEL);
>> + if (!cdns_ctrl->dma_cdma_desc)
>> + return -ENOMEM;
>> +
>> + cdns_ctrl->buf_size = 16 * 1024;
>
>s/1024/SZ_1K/
>
>> + cdns_ctrl->buf = kmalloc(cdns_ctrl->buf_size, GFP_KERNEL);
>
>If you use kmalloc here then this buffer will always be DMA-able,
>right?
Right I have seen such solution in another driver.


Thanks for revieving this patch. Please answer on my question how write_oob
and read_oob functions should be implemented.

>
>
>Thanks,
>MiquÃl

Thanks
Piotr Sroka

Thanks,
MiquÃl

Thanks
Piotr