Re: [PATCH 1/3] thermal: ti-soc-thermal: Fix stuck sensor with continuous mode for 4430

From: Adam Ford
Date: Fri Jan 08 2021 - 14:43:01 EST


On Fri, Jan 8, 2021 at 12:31 PM Adam Ford <aford173@xxxxxxxxx> wrote:
>
> On Fri, Jan 8, 2021 at 7:45 AM Adam Ford <aford173@xxxxxxxxx> wrote:
> >
> > On Fri, Jan 8, 2021 at 1:22 AM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> > >
> > > * H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> [201230 13:29]:
> > > > > Am 30.12.2020 um 13:55 schrieb Adam Ford <aford173@xxxxxxxxx>:
> > > > > On Wed, Dec 30, 2020 at 2:43 AM Tony Lindgren <tony@xxxxxxxxxxx> wrote:
> > > > >>
> > > > >> At least for 4430, trying to use the single conversion mode eventually
> > > > >> hangs the thermal sensor. This can be quite easily seen with errors:
> > > > >>
> > > > >> thermal thermal_zone0: failed to read out thermal zone (-5)
> > > ...
> > >
> > > > > I don't have an OMAP4, but if you want, I can test a DM3730.
> > > >
> > > > Indeed I remember a similar discussion from the DM3730 [1]. temp values were
> > > > always those from the last measurement. E.g. the first one was done
> > > > during (cold) boot and the first request after 10 minutes did show a
> > > > quite cold system... The next one did show a hot system independent
> > > > of what had been between (suspend or high activity).
> > > >
> > > > It seems as if it was even reproducible with a very old kernel on a BeagleBoard.
> > > > So it is quite fundamental.
> > > >
> > > > We tried to fix it but did not come to a solution [2]. So we opened an issue
> > > > in our tracker [3] and decided to stay with continuous conversion although this
> > > > raises idle mode processor load.
> > >
> > > Hmm so maybe eocz high always times out in single mode since it also
> > > triggers at least on dra7?
> > >
> > > Yes it would be great if you guys can the $subject patch a try at
> > > least on your omap36xx and omap5 boards and see if you see eocz
> > > time out warnings in dmesg.


I do see chatter.

[ 15.531005] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low
[ 16.571075] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low
[ 17.610961] ti-soc-thermal 48002524.bandgap: eocz timed out waiting low

and it repeats quite often.

I would say this patch series would cause a regression on the DM3730.

adam


> >
> > I should be able to try it on the dm3730 logicpd-torpedo kit this weekend.
>
> I am going to be a bit delayed testing this. I cannot boot omap2plus
> using Linux version 5.11.0-rc2.
>
> [ 2.666748] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc
> [ 2.673309] nand: Micron MT29F4G16ABBDA3W
> [ 2.677368] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> 2048, OOB size: 64
> [ 2.685119] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW
> [ 2.693237] Invalid ECC layout
> [ 2.696350] omap2-nand 30000000.nand: unable to use BCH library
> [ 2.702575] omap2-nand: probe of 30000000.nand failed with error -22
> [ 2.716094] 8<--- cut here ---
> [ 2.719207] Unable to handle kernel NULL pointer dereference at
> virtual address 00000018
> [ 2.727600] pgd = (ptrval)
> ...
> [ 3.050933] ---[ end trace 59640c7399a80a07 ]---
> [ 3.055603] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [ 3.063323] ---[ end Kernel panic - not syncing: Attempted to kill
> init! exitcode=0x0000000b ]---
>
> Once I get past this, I'll try to test the thermal stuff.
>
> adam
>
> >
> > adam
> > >
> > > Regards,
> > >
> > > Tony