Re: 4.18-rc* regression: x86-32 troubles (with timers?)

From: Arnd Bergmann
Date: Sun Jul 15 2018 - 16:51:49 EST


On Sun, Jul 15, 2018 at 5:05 PM, Meelis Roos <mroos@xxxxxxxx> wrote:

>> > > I then tried multiple other machines. All x86-64 machines seem
>> > > unaffected, some x86-32 machines are affected (Athlon with AMD750
>> > > chipset, Fujitsu RX100-S2 with P4-3.4, and P4 with Intel 865 chipset),
>> > > some very similar x86-32 machines are unaffected. I have different
>> > > customized kernel configuration on them, so far I have not pinpointed
>> > > any configuration option to be at fault.
>> > >
>> > > All machines run Debian unstable.
>> > >
>> > > 4.17.0 was working fine.
>> > >
>> > > Will continue with bisecting between 4.17.0 and
>> > > 4.18.0-rc1-00023-g9ffc59d57228.
>
> Bisection has been finished (I'm usually away from the problematic
> computers in summer), result is strange and seems unrelated:
>
> 0bc5fe857274133ca028ebb15ff2e8549a369916 is the first bad commit
> commit 0bc5fe857274133ca028ebb15ff2e8549a369916
> Author: Sudarsana Reddy Kalluru <sudarsana.kalluru@xxxxxxxxxx>
> Date: Sat May 5 18:42:59 2018 -0700
>
> qed*: Refactor mf_mode to consist of bits.

Agreed, that isn't the one you were looking for.

> `mf_mode' field indicates the multi-partitioning mode the device is
> configured to. This method doesn't scale very well, adding a new MF mode
> requires going over all the existing conditions, and deciding whether those
> are needed for the new mode or not.
> The patch defines a set of bit-fields for modes which are derived according
> to the mode info shared by the MFW and all the configuration would be made
> according to those. To add a new mode, there would be a single place where
> we'll need to go and choose which bits apply and which don't.
>
> Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@xxxxxxxxxx>
> Signed-off-by: Ariel Elior <ariel.elior@xxxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>
> :040000 040000 a3572846e1afb9ccfa9c4a84b0135a0057ade66f bdb7b28725a4f1bffe79ee384a3603b3127d6fdb M drivers
> :040000 040000 f90c7f26fd8445afa48c6679ed68fed294b23d7f 52119c547a82b268b5c173d3df94e267cc1297a0 M include
> mroos@rx100s2:~/linux$ nice git bisect log
> git bisect start# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17
> git bisect good 29dcea88779c856c7dc92040a0c01233263101d4
> # good: [e27c49291a7fe9dc415c9fcab5bd781ec82dfe04] x86: Convert x86_platform_ops to timespec64
> git bisect good e27c49291a7fe9dc415c9fcab5bd781ec82dfe04
> # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21
> # bad: [1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect bad 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21
> # good: [135c5504a600ff9b06e321694fbcac78a9530cd4] Merge tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm
> git bisect good 135c5504a600ff9b06e321694fbcac78a9530cd4
> # bad: [ffbc9197b4721634dc6c0fefa9b31e565fa89cee] wcn36xx: improve debug and error messages for SMD
> git bisect bad ffbc9197b4721634dc6c0fefa9b31e565fa89cee
> # good: [3a443bd6dd7c43bf5763779309514bf3e7c1c3eb] net/9p: correct the variable name in v9fs_get_trans_by_name() comment
> git bisect good 3a443bd6dd7c43bf5763779309514bf3e7c1c3eb
> # bad: [93c65d13d8a0b7c272868d4a9779f96fc973df26] vmxnet3: Replace msleep(1) with usleep_range()
> git bisect bad 93c65d13d8a0b7c272868d4a9779f96fc973df26
> # good: [4bc871984f7cb5b2dec3ae64b570cb02f9ce2227] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> git bisect good 4bc871984f7cb5b2dec3ae64b570cb02f9ce2227

Everything below here is is 'bad', which can be an indication that you
misclassified one of
the commits above as 'good' when it should have been 'bad'. The most likely
explanations are that you either typed the 'git bisect good' by accident, or
that the failure is not 100% reliable, and it sometimes works fine even on a
broken kernel.

0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the
variable name in v9fs_get_trans_by_name() comment", which is marked "good",
and can't really be good if 0bc5fe85727413 is bad and you are not using the
'qed' driver.

I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and
if it was, test v4.17-rc4, which is what the net-next tree was based on.

Arnd