x86-tip tsc/tick gripage
From: Mike Galbraith
Date: Wed Apr 26 2017 - 04:12:37 EST
Greetings,
After picking up the pieces of my tip-rt tree, I'm seeing grumbling on
two boxen, and it ain't me, the below is virgin tip. The second box
has crap BIOS (replacement ready to be installed), but works fine if
the sync code gets the things synchronized before giving up (bumping
loop count a wee bit makes that reliable). This first box is my
primary RT test box, not a racehorse (old nag), but highly reliable.
tip v4.11-rc8-893-g8ec9e12aff06, trusty ole 8 socket (X7560) DL980 G7
[ 11.856619] tsc: Refined TSC clocksource calibration: 2260.999 MHz
[ 11.867703] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x20974a4d8bb, max_idle_ns: 440795246623 ns
[ 12.951432] clocksource: Switched to clocksource tsc
[ 255.674776] clocksource: timekeeping watchdog on CPU54: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 255.895200] clocksource: 'tsc' cs_now: 11b2a29f798 cs_last: c4faa1e90a mask: ffffffffffffffff
[ 256.105724] tsc: Marking TSC unstable due to clocksource watchdog
[ 316.976942] ------------[ cut here ]------------
[ 316.980923] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:874 tick_nohz_stop_sched_tick+0x339/0x380
[ 316.980923] Modules linked in: autofs4(E) edd(E) af_packet(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_powersave(E) fuse(E) loop(E) md_mod(E) dm_mod(E) vhost_net(E) vhost(E) tap(E) tun(E) kvm_intel(E) iTCO_wdt(E) kvm(E) gpio_ich(E) iTCO_vendor_support(E) i7core_edac(E) ipmi_ssif(E) joydev(E) lpc_ich(E) edac_core(E) hpilo(E) bnx2(E) netxen_nic(E) sr_mod(E) hid_generic(E) hpwdt(E) irqbypass(E) mfd_core(E) shpchp(E) ipmi_si(E) pcc_cpufreq(E) ehci_pci(E) ipmi_msghandler(E) acpi_cpufreq(E) cdrom(E) sg(E) acpi_power_meter(E) pcspkr(E) button(E) ext4(E) mbcache(E) jbd2(E) crc16(E) usbhid(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) thermal(E) sd_mod(E) scsi_dh_hp_sw(E) scsi_dh_emc(E) scsi_dh_rdac(E) scsi_dh_alua(E) ata_generic(E) ata_piix(E) libata(E) hpsa(E) scsi_transport_sas(E) cciss(E) scsi_mod(E)
[ 316.980923] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 4.11.0-tip-default #57
[ 316.980923] Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 07/07/2010
[ 316.980923] task: ffffffff81c104c0 task.stack: ffffffff81c00000
[ 316.980923] RIP: 0010:tick_nohz_stop_sched_tick+0x339/0x380
[ 316.980923] RSP: 0018:ffff88027fc03f40 EFLAGS: 00010093
[ 316.980923] RAX: 0000000000000001 RBX: ffff88027fc15d20 RCX: 00000049cc0cd28a
[ 316.980923] RDX: 00000068e05e6aa0 RSI: ffff88017e286010 RDI: ffff88027fc15e00
[ 316.980923] RBP: 00000049cc0cbf00 R08: ffffffffffffffdd R09: 0000000040001000
[ 316.980923] R10: 00000001000010b3 R11: 0000000000000000 R12: 00000049cc0cf0de
[ 316.980923] R13: ffff88027fc0d640 R14: 00000049a9b7af00 R15: 00000049a9b7af00
[ 316.980923] FS: 0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
[ 316.980923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 316.980923] CR2: 00007f229a37e000 CR3: 0000000001c09000 CR4: 00000000000006f0
[ 316.980923] Call Trace:
[ 316.980923] <IRQ>
[ 316.980923] ? __tick_nohz_idle_enter+0x8e/0x150
[ 316.980923] ? irq_exit+0x77/0xe0
[ 316.980923] ? do_IRQ+0x4c/0xd0
[ 316.980923] ? common_interrupt+0x8c/0x8c
[ 316.980923] </IRQ>
[ 316.980923] ? poll_idle+0x2d/0x58
[ 316.980923] ? cpuidle_enter_state+0x9d/0x260
[ 316.980923] ? do_idle+0x165/0x1d0
[ 316.980923] ? cpu_startup_entry+0x5d/0x70
[ 316.980923] ? start_kernel+0x481/0x48c
[ 316.980923] ? set_init_arg+0x50/0x50
[ 316.980923] ? early_idt_handler_array+0x120/0x120
[ 316.980923] ? x86_64_start_kernel+0x147/0x156
[ 316.980923] ? secondary_startup_64+0x9f/0x9f
[ 316.980923] Code: 49 39 cf 7c 26 48 89 c8 e9 94 fe ff ff f3 90 e9 03 fd ff ff 83 7b 48 02 75 d9 48 89 df e8 d0 1a ff ff 49 8b 45 18 e9 76 fe ff ff <0f> ff 80 3d ee 59 cd 00 00 0f 85 8d fe ff ff 31 c0 4c 89 fa 48
[ 316.980923] ---[ end trace 996f71f82094d582 ]---
[ 316.980923] basemono: 316956000000 ts->next_tick: 316380000000 dev->next_event: 316956005002
tip v4.11-rc8-887-gefb93ce741c6, 2 socket E7-8890 v3 (72 core) box
[ 29.473651] tsc: Refined TSC clocksource calibration: 2493.986 MHz
[ 29.473701] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23f308a7388, max_idle_ns: 440795280393 ns
[ 30.506471] clocksource: Switched to clocksource tsc
[ 716.538171] clocksource: timekeeping watchdog on CPU67: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 716.584589] clocksource: 'tsc' cs_now: d171cd5ce006e cs_last: d165a9ce422d1 mask: ffffffffffffffff
[ 716.609048] tsc: Marking TSC unstable due to clocksource watchdog
[ 1067.093433] ------------[ cut here ]------------
[ 1067.097416] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:874 tick_nohz_stop_sched_tick+0x339/0x380
[ 1067.097416] Modules linked in: af_packet(E) iscsi_ibft(E) intel_rapl(E) iscsi_boot_sysfs(E) sb_edac(E) edac_core(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) ext4(E) crc16(E) irqbypass(E) jbd2(E) mbcache(E) joydev(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) ipmi_ssif(E) ixgbe(E) iTCO_wdt(E) aesni_intel(E) i40e(E) iTCO_vendor_support(E) aes_x86_64(E) mdio(E) crypto_simd(E) lpc_ich(E) mptctl(E) ptp(E) glue_helper(E) ipmi_si(E) cryptd(E) pcspkr(E) pps_core(E) dca(E) mfd_core(E) i2c_i801(E) mptbase(E) ipmi_devintf(E) ipmi_msghandler(E) wmi(E) button(E) acpi_pad(E) shpchp(E) sunrpc(E) btrfs(E) xor(E) raid6_pq(E) sd_mod(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ahci(E)
[ 1067.097416] fb_sys_fops(E) ehci_pci(E) libahci(E) ttm(E) ehci_hcd(E) crc32c_intel(E) libata(E) drm(E) usbcore(E) mpt3sas(E) raid_class(E) scsi_transport_sas(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[ 1067.097416] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 4.11.0-tip-default #226
[ 1067.097416] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
[ 1067.097416] task: ffffffff81c104c0 task.stack: ffffffff81c00000
[ 1067.097416] RIP: 0010:tick_nohz_stop_sched_tick+0x339/0x380
[ 1067.097416] RSP: 0018:ffff88085f803f40 EFLAGS: 00010087
[ 1067.097416] RAX: 0000000000000001 RBX: ffff88085f815d20 RCX: 000000f86d81fe00
[ 1067.097416] RDX: 800000f86d70d2ff RSI: 0000000000000048 RDI: ffff88085f815e00
[ 1067.097416] RBP: 000000f86d70d300 R08: ffffffffffffff2c R09: 000000004002ed00
[ 1067.097416] R10: 000000010002edd8 R11: 0000000000000000 R12: 000000f86d82121a
[ 1067.097416] R13: ffff88085f80d640 R14: 000000de6fa7af00 R15: 000000de6fa7af00
[ 1067.097416] FS: 0000000000000000(0000) GS:ffff88085f800000(0000) knlGS:0000000000000000
[ 1067.097416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1067.097416] CR2: 00007fba363082a7 CR3: 000000017c3f9000 CR4: 00000000001406f0
[ 1067.097416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1067.097416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1067.097416] Call Trace:
[ 1067.097416] <IRQ>
[ 1067.097416] ? __tick_nohz_idle_enter+0x8e/0x150
[ 1067.097416] ? irq_exit+0x77/0xe0
[ 1067.097416] ? do_IRQ+0x4c/0xd0
[ 1067.097416] ? common_interrupt+0x8c/0x8c
[ 1067.097416] </IRQ>
[ 1067.097416] ? cpuidle_enter_state+0xe3/0x260
[ 1067.097416] ? do_idle+0x165/0x1d0
[ 1067.097416] ? cpu_startup_entry+0x5d/0x70
[ 1067.097416] ? start_kernel+0x481/0x48c
[ 1067.097416] ? set_init_arg+0x50/0x50
[ 1067.097416] ? early_idt_handler_array+0x120/0x120
[ 1067.097416] ? x86_64_start_kernel+0x147/0x156
[ 1067.097416] ? secondary_startup_64+0x9f/0x9f
[ 1067.097416] Code: 49 39 cf 7c 26 48 89 c8 e9 94 fe ff ff f3 90 e9 03 fd ff ff 83 7b 48 02 75 d9 48 89 df e8 d0 1a ff ff 49 8b 45 18 e9 76 fe ff ff <0f> ff 80 3d 6e 5a cd 00 00 0f 85 8d fe ff ff 31 c0 4c 89 fa 48
[ 1067.097416] ---[ end trace 65463b720a8bd23d ]---
[ 1067.097416] basemono: 1066988000000 ts->next_tick: 955356000000 dev->next_event: 1066989125120