Re: [PATCH 1/2] mmc: bcm2835: reset host on timeout

From: Michal SuchÃnek
Date: Sun Mar 04 2018 - 13:44:23 EST


On Sun, 4 Mar 2018 19:11:49 +0100 (CET)
Stefan Wahren <stefan.wahren@xxxxxxxx> wrote:

> Hi Michal,
>
> > Michal SuchÃnek <msuchanek@xxxxxxx> hat am 4. MÃrz 2018 um 16:57
> > geschrieben:
> >
> >
> > On Wed, 14 Feb 2018 21:30:16 +0100 (CET)
> > Stefan Wahren <stefan.wahren@xxxxxxxx> wrote:
> >
> > > Hi Michal,
> > >
> > > > Michal SuchÃnek <msuchanek@xxxxxxx> hat am 14. Februar 2018 um
> > > > 20:24 geschrieben:
> > > >
> > > >
> > > > On Wed, 14 Feb 2018 17:49:31 +0100
> > > > Stefan Wahren <stefan.wahren@xxxxxxxx> wrote:
> > > >
> > > > > Hi Michal,
> > > > >
> > > > > [add Phil]
> > > > >
> > > > > Am 14.02.2018 um 17:13 schrieb Michal SuchÃnek:
> > > > > > On Wed, 14 Feb 2018 16:36:49 +0100
> > > > > > Michal SuchÃnek <msuchanek@xxxxxxx> wrote:
> > > > > >
> > > > > >> On Wed, 14 Feb 2018 15:58:31 +0100
> > > > > >> Stefan Wahren <stefan.wahren@xxxxxxxx> wrote:
> > > > > >>
> > > > > >>> Hi Michal,
> > > > > >>>
> > > > > >>> Am 14.02.2018 um 15:38 schrieb Michal Suchanek:
> > > > > >>>> The bcm2835 mmc host tends to lock up for unknown reason
> > > > > >>>> so reset it on timeout. The upper mmc block layer tries
> > > > > >>>> retransimitting with single blocks which tends to work
> > > > > >>>> out after a long wait.
> > > > > >>>>
> > > > > >>>> This is better than giving up and leaving the machine
> > > > > >>>> broken for no obvious reason.
> > > > > >>> could you please provide more information about this issue
> > > > > >>> (affected hardware, kernel config, version, dmesg,
> > > > > >>> reproducible scenario)?
> > > > > > It tends to reproduce when upgrading a few packages with
> > > > > > zypper and otherwise at random during system operation. It
> > > > > > seems that for my card it worsens with age to some degree
> > > > > > so perhaps it depends on the fragmentation of the internal
> > > > > > card flash.
> > > > > >
> > > > > > Attaching dmesg and kernel config.
> > > > >
> > > > > do you noticed this issue before 4.15-rc4?
> > > >
> > > > I initially noticed it with 4.4 kernel with some backports to
> > > > make it bootable on RPi.
> > >
> > > this confuses me. Gerd and i ported this driver from downstream
> > > and finally it's got merged in 4.12.
> > >
> > > So do you mean that you backported the mainline version to 4.4 or
> > > the downstream version of 4.4?
> >
> > I did not backport it but looking at the changelog it is backport of
> > the 4.12 driver. It does not look as the 4.15 driver though. Looks
> > like there was some reorganization of the bcm mmc since then.
> >
> > >
> > > On a quick look they seems identical, but they aren't.
> > >
> > > > >
> > > > > Could you please test with 4.15 final again?
> > > >
> >
> > I tried upgrading to the current master (4.16-rc3+) and the issue is
> > still reproducible although less frequent. I did full upgrade from
> > the install image which installs over 300 packages and the issue
> > triggered somewhere around 200th while before installing a half
> > dozen packages would usually trigger it.
> >
>
> this is the same what i did during my stress tests. The step
> installed 475 packages. The timeout never occured.

First off, you did your testing with Tumbleweed image which probably
uses btrfs for / while I use Leap 42.3 image which uses ext4.

I was not able to reproduce the issue with installing packages so far -
installed GNOME which is over 700 packages and the issue did not
trigger. However, the upgrades also unlink the old files as does
removing packages - removing GNOME removed over 200 packages and the
issue triggered. However, re-installing and removing GNOME again did
not trigger the issue. So nothing so far reproduces the issue reliably.
With the new kernel the issue even reproduces less frequently than the
4.4 and 4.15 kernel - probably some i/o scheduler change affects the
disk i/o patterns.

Thanks

Michal