Re: Regression in 5.3-rc1 and later

From: Vincenzo Frascino
Date: Fri Aug 23 2019 - 06:42:11 EST


Hi Russell,

On 8/23/19 11:36 AM, Russell King - ARM Linux admin wrote:
> Hi,
>
> To everyone on the long Cc list...
>
> What's happening with this? I was about to merge the patches for 32-bit
> ARM, which I don't want to do if doing so will cause this regression on
> 32-bit ARM as well.
>

The regression is sorted as of yesterday, a new patch is going through tip:
timers/urgent and will be part of the next -rc.

If you want to merge them there should be nothing blocking.

> Thanks.
>
> On Thu, Aug 22, 2019 at 07:57:59AM +0100, Chris Clayton wrote:
>> Hi everyone,
>>
>> Firstly, apologies to anyone on the long cc list that turns out not to be particularly interested in the following, but
>> you were all marked as cc'd in the commit message below.
>>
>> I've found a problem that isn't present in 5.2 series or 4.19 series kernels, and seems to have arrived in 5.3-rc1. The
>> problem is that if I suspend (to ram) my laptop, on resume 14 minutes or more after suspending, I have no networking
>> functionality. If I resume the laptop after 13 minutes or less, networking works fine. I haven't tried to get finer
>> grained timings between 13 and 14 minutes, but can do if it would help.
>>
>> ifconfig shows that wlan0 is still up and still has its assigned ip address but, for instance, a ping of any other
>> device on my network, fails as does pinging, say, kernel.org. I've tried "downing" the network with (/sbin/ifdown) and
>> unloading the iwlmvm module and then reloading the module and "upping" (/sbin/ifup) the network, but my network is still
>> unusable. I should add that the problem also manifests if I hibernate the laptop, although my testing of this has been
>> minimal. I can do more if required.
>>
>> As I say, the problem first appears in 5.3-rc1, so I've bisected between 5.2.0 and 5.3-rc1 and that concluded with:
>>
>> [chris:~/kernel/linux]$ git bisect good
>> 7ac8707479886c75f353bfb6a8273f423cfccb23 is the first bad commit
>> commit 7ac8707479886c75f353bfb6a8273f423cfccb23
>> Author: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
>> Date: Fri Jun 21 10:52:49 2019 +0100
>>
>> x86/vdso: Switch to generic vDSO implementation
>>
>> The x86 vDSO library requires some adaptations to take advantage of the
>> newly introduced generic vDSO library.
>>
>> Introduce the following changes:
>> - Modification of vdso.c to be compliant with the common vdso datapage
>> - Use of lib/vdso for gettimeofday
>>
>> [ tglx: Massaged changelog and cleaned up the function signature formatting ]
>>
>> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: linux-arch@xxxxxxxxxxxxxxx
>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>> Cc: linux-mips@xxxxxxxxxxxxxxx
>> Cc: linux-kselftest@xxxxxxxxxxxxxxx
>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>> Cc: Will Deacon <will.deacon@xxxxxxx>
>> Cc: Arnd Bergmann <arnd@xxxxxxxx>
>> Cc: Russell King <linux@xxxxxxxxxxxxxxx>
>> Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
>> Cc: Paul Burton <paul.burton@xxxxxxxx>
>> Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
>> Cc: Mark Salyzyn <salyzyn@xxxxxxxxxxx>
>> Cc: Peter Collingbourne <pcc@xxxxxxxxxx>
>> Cc: Shuah Khan <shuah@xxxxxxxxxx>
>> Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>
>> Cc: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
>> Cc: Huw Davies <huw@xxxxxxxxxxxxxxx>
>> Cc: Shijith Thotton <sthotton@xxxxxxxxxxx>
>> Cc: Andre Przywara <andre.przywara@xxxxxxx>
>> Link: https://lkml.kernel.org/r/20190621095252.32307-23-vincenzo.frascino@xxxxxxx
>>
>> arch/x86/Kconfig | 3 +
>> arch/x86/entry/vdso/Makefile | 9 ++
>> arch/x86/entry/vdso/vclock_gettime.c | 245 ++++---------------------------
>> arch/x86/entry/vdso/vdsox32.lds.S | 1 +
>> arch/x86/entry/vsyscall/Makefile | 2 -
>> arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 -----------
>> arch/x86/include/asm/pvclock.h | 2 +-
>> arch/x86/include/asm/vdso/gettimeofday.h | 191 ++++++++++++++++++++++++
>> arch/x86/include/asm/vdso/vsyscall.h | 44 ++++++
>> arch/x86/include/asm/vgtod.h | 75 +---------
>> arch/x86/include/asm/vvar.h | 7 +-
>> arch/x86/kernel/pvclock.c | 1 +
>> 12 files changed, 284 insertions(+), 379 deletions(-)
>> delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c
>> create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h
>> create mode 100644 arch/x86/include/asm/vdso/vsyscall.h
>>
>> To confirm my bisection was correct, I did a git checkout of 7ac8707479886c75f353bfb6a8273f423cfccb2. As expected, the
>> kernel exhibited the problem I've described. However, a kernel built at the immediately preceding (parent?) commit
>> (bfe801ebe84f42b4666d3f0adde90f504d56e35b) has a working network after a (>= 14minute) suspend/resume cycle.
>>
>> As the module name implies, I'm using wireless networking. The hardware is detected as "Intel(R) Wireless-AC 9260
>> 160MHz, REV=0x324" by iwlwifi.
>>
>> I'm more than happy to provide additional diagnostics (but may need a little hand-holding) and to apply diagnostic or
>> fix patches, but please cc me on any reply as I'm not subscribed to any of the kernel-related mailing lists.
>>
>> Chris
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>

--
Regards,
Vincenzo