Re: kernel-4.9.270 crash

From: Greg KH
Date: Mon Sep 06 2021 - 06:52:27 EST


On Mon, Sep 06, 2021 at 11:36:11AM +0200, wim wrote:
> On Mon, Sep 06, 2021 at 06:59:22AM +0200, Greg KH wrote:
> > On Sun, Sep 05, 2021 at 09:00:45PM +0200, wim wrote:
> > > On Sun, Sep 05, 2021 at 01:52:31AM +0200, wim wrote:
> > > >
> > > > Hello Greg,
> > > >
> > > > from kernel-4.9.270 up until now (4.9.282) I experience kernel crashes upon
> > > > loading a GPU module.
> > > > It happens on two out of at least six different machines.
> > > > I can't believe that I'm the only one where that happens, but since the bug
> > > > is still there twelve versions later, I need to report this.
> > > > ...
> >
> > Do you have any kernel log messages when these crashes happen?
>
> On the AMD machine:
>
> Aug 1 20:51:24 djo kernel: [drm] Initialized
> Aug 1 20:51:24 djo kernel: checking generic (a0000 10000) vs hw (e0000000 8000000)
> Aug 1 20:51:24 djo kernel: checking generic (a0000 10000) vs hw (ea000000 1000000)
> Aug 1 20:51:24 djo kernel: fb: switching to nouveaufb from VGA16 VGA
> Aug 1 20:51:24 djo kernel: divide error: 0000 [#1] SMP
> Aug 1 20:51:24 djo kernel: Modules linked in: nouveau(+) video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm agpgart i2c_algo_bit tun lirc_serial(C) lirc_dev arc4 binfmt_misc snd_pcm_oss snd_mixer_oss fbcon bitblit softcursor font tileblit ath9k_htc ath9k_common ath9k_hw ath mac80211 cfg80211 uvcvideo rfkill firmware_class snd_usb_audio sr9700 videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib videobuf2_v4l2 dm9601 videobuf2_core usbnet snd_rawmidi mii usb_storage snd_hda_codec_generic kvm snd_hda_intel irqbypass snd_hda_codec gpio_ich ppdev snd_hwdep pcspkr snd_hda_core snd_pcm uhci_hcd ohci_pci snd_timer ohci_hcd lpc_ich ehci_pci snd ehci_hcd wmi mfd_core usbcore soundcore parport_pc floppy usb_common parport acpi_cpufreq button processor
> Aug 1 20:51:24 djo kernel: CPU: 0 PID: 2791 Comm: modprobe Tainted: G C 4.9.277 #1
> Aug 1 20:51:24 djo kernel: Hardware name: Hewlett-Packard HP xw4300 Workstation/0A00h, BIOS 786D3 v01.08 03/10/2006
> Aug 1 20:51:24 djo kernel: task: f6317080 task.stack: f4058000
> Aug 1 20:51:24 djo kernel: EIP: 0060:[<c02f789d>] EFLAGS: 00010206 CPU: 0
> Aug 1 20:51:24 djo kernel: EAX: 00000190 EBX: ffffffea ECX: 00000019 EDX: 00000000
> Aug 1 20:51:24 djo kernel: ESI: f52db800 EDI: 00000050 EBP: c02f7838 ESP: f4059c10
> Aug 1 20:51:24 djo kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Aug 1 20:51:24 djo kernel: CR0: 80050033 CR2: 080a1a54 CR3: 35234000 CR4: 00000690
> Aug 1 20:51:24 djo kernel: Stack:
> Aug 1 20:51:24 djo kernel: 00000050 f52db800 00000019 c0340732 00000000 000000a0 000000a0 00000fa0
> Aug 1 20:51:24 djo kernel: f62f4000 0000001e 00000000 00000000 f5a63800 00000000 00000000 00000000
> Aug 1 20:51:24 djo kernel: 00000000 00000000 f6024000 00000000 f52db800 00000001 00000000 00000000
> Aug 1 20:51:24 djo kernel: Call Trace:
> Aug 1 20:51:24 djo kernel: [<c0340732>] ? 0xc0340732
> Aug 1 20:51:24 djo kernel: [<c0340988>] ? 0xc0340988
> Aug 1 20:51:24 djo kernel: [<c02f734a>] ? 0xc02f734a
> Aug 1 20:51:24 djo kernel: [<c033f780>] ? 0xc033f780
> Aug 1 20:51:24 djo kernel: [<c0340b32>] ? 0xc0340b32
> Aug 1 20:51:24 djo kernel: [<c0340d20>] ? 0xc0340d20
> Aug 1 20:51:24 djo kernel: [<f8bc4ef7>] ? 0xf8bc4ef7
> Aug 1 20:51:24 djo kernel: [<c0163715>] ? 0xc0163715

<snip>

These aren't going to help us much, can you turn on debugging symbols
for these crashes for us to see the symbol names?

<snip>

> > Can you use 'git bisect' to track down the offending commit?
>
> If I would know how to do that

'man git bisect' should provide a tutorial on how to do this.

> > And why are you stuck on 4.9.y for these machines? Why not use 5.10 or
> > newer?
>
> Because in 4.10 they dropped lirc-serial and I need that. The new ir-serial
> is no replacement. (The last working version of LIRC is 0.9.6. After that
> they destroyed transmitter support.)
>
> (I believe irda support got dropped too, which I need for my old nokia.)

If the new functionality is not working properly, please work with those
developers to fix that up. Sticking with the 4.4.x kernel isn't going
to be a good long-term solution for you.

thanks,

greg k-h