Re: tip.today - scheduler bam boom crash (cpu hotplug)

From: Peter Zijlstra
Date: Thu Jan 19 2017 - 05:35:46 EST


On Thu, Jan 19, 2017 at 08:31:09AM +0100, Mike Galbraith wrote:
> Mindless testing only, too sick to work, not sick enough to be immune
> to boredom. Was verifying first warning wasn't somehow rt inspired,
> but while doing so, plain nopreempt (and no rt patch set) went boom.
>
> [ 203.088255] smpboot: CPU 1 is now offline
> [ 203.168181] smpboot: CPU 2 is now offline
> [ 203.221461] x86: Booting SMP configuration:
> [ 203.221464] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 203.221728] ------------[ cut here ]------------
> [ 203.221733] WARNING: CPU: 1 PID: 0 at kernel/sched/clock.c:149 set_sched_clock_stable+0x43/0x50
> [ 203.221733] Modules linked in: nls_utf8(E) isofs(E) ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) snd_hda_codec_hdmi(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) snd_hda_intel(E)
> [ 203.221748] snd_hda_codec(E) irqbypass(E) crct10dif_pclmul(E) snd_hda_core(E) snd_hwdep(E) nfsd(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) pcbc(E) snd_pcm(E) auth_rpcgss(E) aesni_intel(E) aes_x86_64(E) snd_timer(E) nfs_acl(E) joydev(E) crypto_simd(E) snd(E) lockd(E) grace(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mei_me(E) i2c_i801(E) mei(E) pcspkr(E) glue_helper(E) mfd_core(E) shpchp(E) intel_smartconnect(E) sunrpc(E) soundcore(E) tpm_infineon(E) fan(E) thermal(E) battery(E) cryptd(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) libahci(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E)
> [ 203.221765] ttm(E) libata(E) r8169(E) mii(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
> [ 203.221773] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G E 4.10.0-tip-default #29
> [ 203.221774] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [ 203.221774] Call Trace:
> [ 203.221778] dump_stack+0x63/0x90
> [ 203.221780] __warn+0xd1/0xf0
> [ 203.221782] warn_slowpath_null+0x1d/0x20
> [ 203.221782] set_sched_clock_stable+0x43/0x50
> [ 203.221784] early_init_intel+0x225/0x360
> [ 203.221785] init_intel+0x18/0x2d0
> [ 203.221786] identify_cpu+0x2d1/0x4d0
> [ 203.221786] identify_secondary_cpu+0x18/0x80
> [ 203.221789] smp_store_cpu_info+0x3e/0x40
> [ 203.221790] start_secondary+0x53/0x180
> [ 203.221791] start_cpu+0x14/0x14
> [ 203.221792] ---[ end trace 262c7e4b746d5a76 ]---


OK, you also forgot to tell what you did to trigger this, but a little
playing around seems enough to reproduce. All that was required was
offline + online and *boom*.

I'll go have a prod. Thanks!