Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V

From: Masahiro Yamada
Date: Fri Jan 10 2020 - 21:39:45 EST


On Sat, Jan 11, 2020 at 7:38 AM Tim Sander <tim@xxxxxxxxxxxxxxx> wrote:
>
> Hi
> Am Freitag, 10. Januar 2020, 20:05:20 CET schrieb Masahiro Yamada:
> > On Sat, Jan 11, 2020 at 1:47 AM Tim Sander <tim@xxxxxxxxxxxxxxx> wrote:
> > > Hi Masahiro Yamada
> > >
> > > Sorry for the large delay. I have seen the patches at
> > > https://lists.infradead.org/pipermail/linux-mtd/2019-December/092852.html
> > > Seem to resolve the question about the spare_area_skip_bytes register.
> > >
> > > I have now set the register to 2 which seems to be the right choice on an
> > > Intel SocFPGA. But still i am out of luck trying to boot 5.4.5-rt3 or
> > > 5.5-rc5. I get the following messages during bootup booting:
> > > [ 1.825590] denali-nand-dt ff900000.nand: timeout while waiting for irq
> > > 0x1000 [ 1.832936] denali-nand-dt: probe of ff900000.nand failed with
> > > error -5
> > >
> > > But the commit c19e31d0a32dd 2017-06-13 22:45:38 predates the 4.19 kernel
> > > release (Mon Oct 22 07:37:37 2018). So it seems there is not an obvious
> > > commit which is causing the problem. Looking at the changes it might be
> > > that the timing calculations in the driver changed which might also lead
> > > to a similar error.
> > >
> > > I am booting via NFS the bootloader is placed in NOR flash. The
> > > corresponding>
> > > nand dts entry is updated to the new format and looks like this:
> > > nand@ff900000 {
> > >
> > > #address-cells = <0x1>;
> > > #size-cells = <0x0>;
> > > compatible = "altr,socfpga-denali-nand";
> > > reg = <0xff900000 0x100000 0xffb80000 0x10000>;
> > > reg-names = "nand_data", "denali_reg";
> > > interrupts = <0x0 0x90 0x4>;
> > > clocks = <0x2d 0x1e 0x2e>;
> > > clock-names = "nand", "nand_x", "ecc";
> > > resets = <0x6 0x24>;
> > > status = "okay";
> > > nand@0 {
> > >
> > > reg = <0x0>;
> > > #address-cells = <0x1>;
> > > #size-cells = <0x1>;
> > > partition@0 {
> > >
> > > label = "work";
> > > reg = <0x0 0x10000000>;
> > >
> > > };
> > >
> > > };
> > >
> > > };
> > >
> > > The last kernel i am able to boot is 4.19.10. I have tried booting:
> > > 5.1.21, 5.2.9, 5.3-rc8, 5.4.5-rt3 and 5.5-rc5. They all failed.
> > > Unfortunately the range is quite large for bisecting the problem. It also
> > > occurred to me that all the platforms with Intel Cyclone V in mainline
> > > are development boards which boot from SD-card not exhibiting this
> > > problem on their default boot path.
> > What will happen if you apply all of these:
> >
> > http://patchwork.ozlabs.org/project/linux-mtd/list/?series=149821
> I have applied this patch set but it does not help completely.


OK, I just wanted to eliminate any other possibility, just in case.


> The timings are
> wrong. I don't have access to the hardware now but one thing i tested before i
> left (the HW) was to write the NAND timings from the bootloader into the
> denali controller after the driver configured the timings in denali_init.
> After that the driver worked again for me.
>
> > on top of the mainline kernel,
> > and then, hack denali->clk_rate and denali->clk_x_rate as follows?
> >
> >
> > - denali->clk_rate = clk_get_rate(dt->clk);
> > - denali->clk_x_rate = clk_get_rate(dt->clk_x);
> > + denali->clk_rate = 50000000;
> > + denali->clk_x_rate = 200000000;
> >
> > If it still fails, what about this?
> >
> > denali->clk_rate = 0;
> > denali->clk_x_rate = 0;
> Will try the above next week. Skimming over the socfpga.dtsi it seems as if
> on the Intel SocFPGA the OSC1 has a value of 25000000 set in
> socfpga_cyclone5.dtsi (I am currently not sure about the clock tree with all
> the plls and i am missing the value of osc2?). Also right now it seems i am to
> tired to parse denali_setup_data_interface...


You do not need to parse denali_setup_data_interface().


There are good hints.

You said:
"The last kernel i am able to boot is 4.19.10. I have tried booting:
5.1.21, 5.2.9, 5.3-rc8, 5.4.5-rt3 and 5.5-rc5. They all failed."

There is no commit between 4.19.10 and 5.1.21
that changes denali_setup_data_interface().

So, denali_setup_data_interface() is not the
root cause.


>From the information you provided,
I suspect some clock settings are wrong.



> > > PS: Here is some snippet from an older mail i didn't sent to the list yet
> > > which might be superseded by now:
> > > To get into this matter i started reading the "Intel Cyclone V HPS TRM"
> > > Section 13-20 Preserving Bad Block Markers:
> > > "You can configure the NAND flash controller to skip over a specified
> > > number of bytes when it writes the last sector in a page to the spare
> > > area. This option write the desired offset to the spare_area_skip_bytes
> > > register in the config group. For example, if the device page size is 2
> > > KB, and the device area, set the spare_area_skip_bytes register to 2.
> > > When the flash controller writes the last sector of the page that
> > > overlaps with the spare area, it spare_area_skip_bytes must be an even
> > > number. For example, if the bad block marker is a single byte, set
> > > spare_area_skip_bytes to 2."
> >
> > I did not know this documentation.
> >
> > It says "For example" (twice),
> > it sounds uncertain to me, though.
> >
> > Anyway, an intel engineer checked the boot ROM code.
> > SPARE_AREA_SKIP_BYTES=2 is correct, he said.
> As far as i understand the documentation it must be a multiple of 2. The most
> nand flashes i know need one byte for bad block marking so 2 seems to be a
> pretty sane value.


Most of NAND flashes, but not all.

See the "Bad Block Location" in this page:

http://www.linux-mtd.infradead.org/nand-data/nanddata.html



Many of devices have BBM at 1st byte/word,
but there are devices that have it at 6th byte.

SPARE_AREA_SKIP_BYTES=2 for SOCFPGA
corrupts the BBM at offset 6.
So, probably such a device is not used
on SOCFPGA boards.

I am guessing that is why the UniPhier platform
adopted SPARE_AREA_SKIP_BYTES=8.





> The explanation why default value of
> spare_area_skip_bytes=0 of the boot rom is a little unfortunate is also in the
> documentation: The fact that the ECC values might spill into the spare area
> where the bad block marker of the nand is located.




--
Best Regards
Masahiro Yamada