Re: [PATCHv3] w1: omap-hdq: Simplify driver with PM runtime autosuspend

From: Andreas Kemnade
Date: Tue Apr 21 2020 - 02:53:47 EST


On Mon, 20 Apr 2020 23:11:18 +0200
"H. Nikolaus Schaller" <hns@xxxxxxxxxxxxx> wrote:

> Hi Tony,
>
> > Am 20.04.2020 um 17:08 schrieb Tony Lindgren <tony@xxxxxxxxxxx>:
> >
> > * H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> [200417 21:04]:
> >> To me it looks as if reading hqd too quickly after omap_hdq_runtime_resume()
> >> may be part of the problem, although it is 0.4 seconds between [ 18.355163]
> >> and [ 18.745269]. So I am not sure about my interpretation.
> >>
> >> A different attempt for interpretation may be that trying to read the
> >> slave triggers omap_hdq_runtime_resume() just before doing the
> >> first hdq_read_byte().
> >
> > Hmm so I wonder if adding msleep(100) at the end of
> > omap_hdq_runtime_resume() might help?
>
> I have tried and initially it did boot and work once.
> But after the second boot/reboot the effect was back.
>
> This is something I have observed previously, that the issue
> is there in ca. 9 or 10 boot attempts. So I would assume
> some race condition with udev reading the uevent file of the
> bq27xxx bus client and hence through hdq.
>
> I already had noticed some hqd_read activity right after probing
> success.
>
> I had also tried to change pm_runtime_set_autosuspend_delay(, 1000)
> with no success. And I tried to call omap_hdq_runtime_resume() at the
> end of probe.
>
> The only maybe important observation was when I disabled all
> kernel modules except *hdq*.ko and *bq27*.ko. Then I did only
> get an emergency shell so that it is quite similar to the
> scenario Andreas has tested. With this setup it did work.
>
So I guess without idling uarts?

> I then tried to reenable other kernel modules but the result
> wasn't convincing that it gives a reliable result.
>
> So I have still no clear indication when the problem occurs and
> when not.
>
Hmm, last summer I had problems even without that patch reading
temperature while doing umts transfers. Maybe there are some
connections,
maybe not. For that scenario we might have emc issues, thermal problems
or a real kernel problem.

Regards,
Andreas