Re: possible issue w/ 4.1.9, 4.1.10

From: Udo van den Heuvel
Date: Fri Oct 09 2015 - 11:41:28 EST


On 2015-10-05 15:16, Udo van den Heuvel wrote:
> Did I miss anything that is needed to avoid this?
> Is this a known issue?
>
> Please let me know.

Finally I got some logging.
I booted into 4.1.10 in single user mode. All appeared fine.
Then I went to multi user or whatever the systemd thing is behind `init
3`. No problem.
Then I went to former runlevel 5 and after a short while the problems
reappeared:
Besides the network issue in the log I had disk issues as well:
>From md0 I lost a partition and from md1 I lost a partition from a
different disk.
No such issues in 4.1.8.

[ 335.353756] Bluetooth: BNEP socket layer initialized
[ 696.811881] ------------[ cut here ]------------
[ 696.811891] WARNING: CPU: 3 PID: 20 at net/sched/sch_generic.c:303
dev_watchdog+0x250/0x260()
[ 696.811893] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 696.811895] Modules linked in: bnep bluetooth fuse edac_core
cpufreq_userspace eeprom msr it87 hwmon_vid nfsd auth_rpcgss
oid_registry nfs_acl lockd grace sunrpc ip6t_REJECT
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_reject_ipv6 ipt_REJECT
nf_conntrack_ipv6 nf_reject_ipv4 xt_tcpudp nf_defrag_ipv6 iptable_filter
ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack iptable_nat
nf_conntrack_ipv4 ip6table_filter nf_defrag_ipv4 nf_nat_ipv4 ip6_tables
nf_nat nf_conntrack pwc uvcvideo videobuf2_vmalloc videobuf2_memops
videobuf2_core v4l2_common videodev snd_usb_audio snd_usbmidi_lib
snd_hwdep snd_rawmidi ppdev kvm_amd kvm snd_hda_codec_realtek
snd_hda_codec_generic cp210x usbserial microcode snd_hda_intel cdc_acm
snd_hda_controller snd_hda_codec snd_hda_core snd_seq snd_seq_device
evdev parport_serial
[ 696.811940] k10temp parport_pc parport snd_pcm xhci_pci snd_timer
snd xhci_hcd i2c_piix4 button acpi_cpufreq binfmt_misc ip_tables
x_tables ecb hid_generic usbhid ohci_pci ohci_hcd ehci_pci ehci_hcd
sr_mod cdrom radeon fbcon bitblit softcursor font cfbfillrect cfbimgblt
cfbcopyarea i2c_algo_bit backlight drm_kms_helper ttm drm fb fbdev autofs4
[ 696.811970] CPU: 3 PID: 20 Comm: ksoftirqd/3 Not tainted 4.1.10 #5
[ 696.811972] Hardware name: Gigabyte Technology Co., Ltd. To be filled
by O.E.M./F2A85X-UP4, BIOS F5a 04/30/2013
[ 696.811974] ffffffffb46d1a21 0000000001778562 ffffffffb46d1a21
ffffffffb4552dcb
[ 696.811977] ffff88042e90bcd0 ffffffffb40730db 0000000000000000
ffff88042e1643a0
[ 696.811980] 0000000000000003 ffff88042e164000 0000000000000001
ffffffffb4073188
[ 696.811984] Call Trace:
[ 696.811989] [<ffffffffb4552dcb>] ? dump_stack+0x4a/0x74
[ 696.811993] [<ffffffffb40730db>] ? warn_slowpath_common+0x8b/0xe0
[ 696.811996] [<ffffffffb4073188>] ? warn_slowpath_fmt+0x58/0x80
[ 696.811999] [<ffffffffb4496cb0>] ? dev_watchdog+0x250/0x260
[ 696.812002] [<ffffffffb4496a60>] ? qdisc_rcu_free+0x30/0x30
[ 696.812005] [<ffffffffb40c0143>] ? call_timer_fn.isra.6+0x23/0x90
[ 696.812008] [<ffffffffb4496a60>] ? qdisc_rcu_free+0x30/0x30
[ 696.812010] [<ffffffffb40c03a8>] ? run_timer_softirq+0x1f8/0x2b0
[ 696.812013] [<ffffffffb4001470>] ? __switch_to+0x20/0x600
[ 696.812016] [<ffffffffb40760a7>] ? __do_softirq+0xf7/0x1f0
[ 696.812018] [<ffffffffb40761b9>] ? run_ksoftirqd+0x19/0x40
[ 696.812022] [<ffffffffb40909b0>] ? smpboot_thread_fn+0x170/0x250
[ 696.812025] [<ffffffffb4090840>] ? sort_range+0x20/0x20
[ 696.812027] [<ffffffffb408dc38>] ? kthread+0xc8/0xe0
[ 696.812029] [<ffffffffb408db70>] ? kthread_worker_fn+0x180/0x180
[ 696.812032] [<ffffffffb4558912>] ? ret_from_fork+0x42/0x70
[ 696.812035] [<ffffffffb408db70>] ? kthread_worker_fn+0x180/0x180
[ 696.812037] ---[ end trace 4e22e3f455b32613 ]---
[ 749.858395] INFO: rcu_preempt self-detected stall on CPU { 1}
(t=15000 jiffies g=61792 c=61791 q=297918)
[ 749.858444] Task dump for CPU 1:
[ 749.858459] ksoftirqd/1 R running task 0 12 2
0x00000008
[ 749.858491] ffff88043ec83dd8 00000000b29c67b2 ffffffffb4741c80
ffffffffb40ba775
[ 749.858525] 000000000000f160 ffff88043ec95100 ffffffffb4741c80
0000000000048bbe
[ 749.858559] 0000000000000000 ffffffffb40bdf6c ffff88043ec8f7e0
0000000000000046
[ 749.858593] Call Trace:
[ 749.858604] <IRQ> [<ffffffffb40ba775>] ? rcu_dump_cpu_stacks+0x85/0xe0
[ 749.858637] [<ffffffffb40bdf6c>] ? rcu_check_callbacks+0x43c/0x840
[ 749.858663] [<ffffffffb40d0de0>] ? tick_sched_handle.isra.6+0x30/0x30
[ 749.858690] [<ffffffffb40c208a>] ? hrtimer_run_queues+0x3a/0x120
[ 749.858714] [<ffffffffb40d0de0>] ? tick_sched_handle.isra.6+0x30/0x30
[ 749.858740] [<ffffffffb40c0d59>] ? update_process_times+0x39/0x70
[ 749.858764] [<ffffffffb40d0de0>] ? tick_sched_handle.isra.6+0x30/0x30
[ 749.858789] [<ffffffffb40d0e28>] ? tick_sched_timer+0x48/0x90
[ 749.858813] [<ffffffffb40c15c0>] ? __run_hrtimer.isra.5+0x60/0x150
[ 749.858839] [<ffffffffb40c1d4d>] ? hrtimer_interrupt+0xfd/0x290
[ 749.858863] [<ffffffffb4033450>] ?
smp_trace_apic_timer_interrupt+0x60/0xa0
[ 749.858891] [<ffffffffb455931b>] ? apic_timer_interrupt+0x6b/0x70
[ 749.858915] <EOI> [<ffffffffb40c004a>] ? del_timer_sync+0x3a/0x50
[ 749.858943] [<ffffffffb40c0052>] ? del_timer_sync+0x42/0x50
[ 749.858967] [<ffffffffb44ba8c7>] ? inet_csk_reqsk_queue_drop+0xa7/0x210
[ 749.858993] [<ffffffffb44bab15>] ? reqsk_timer_handler+0xe5/0x2f0
[ 749.859021] [<ffffffffb44baa30>] ? inet_csk_reqsk_queue_drop+0x210/0x210
[ 749.859047] [<ffffffffb40c0143>] ? call_timer_fn.isra.6+0x23/0x90
[ 749.859072] [<ffffffffb44baa30>] ? inet_csk_reqsk_queue_drop+0x210/0x210
[ 749.859098] [<ffffffffb40c03a8>] ? run_timer_softirq+0x1f8/0x2b0
[ 749.859122] [<ffffffffb4001470>] ? __switch_to+0x20/0x600
[ 749.859145] [<ffffffffb40760a7>] ? __do_softirq+0xf7/0x1f0
[ 749.859167] [<ffffffffb40761b9>] ? run_ksoftirqd+0x19/0x40
[ 749.859190] [<ffffffffb40909b0>] ? smpboot_thread_fn+0x170/0x250
[ 749.859214] [<ffffffffb4090840>] ? sort_range+0x20/0x20
[ 749.859235] [<ffffffffb408dc38>] ? kthread+0xc8/0xe0
[ 749.859256] [<ffffffffb408db70>] ? kthread_worker_fn+0x180/0x180
[ 749.859280] [<ffffffffb4558912>] ? ret_from_fork+0x42/0x70
[ 749.859302] [<ffffffffb408db70>] ? kthread_worker_fn+0x180/0x180
[ 756.722904] INFO: rcu_sched detected stalls on CPUs/tasks: { 1}
(detected by 0, t=15002 jiffies, g=-11, c=-12, q=1)
[ 756.724309] Task dump for CPU 1:
[ 756.725643] ksoftirqd/1 R running task 0 12 2
0x00000008
[ 756.726956] ffffffffb40760a7 0000000004208040 0000000100017dc6
ffff88042e8d0000
[ 756.728221] 042080400000000a 0000000000000001 ffff88042e89d7f0
ffff88042e89a280
[ 756.729459] ffffffffb473cd00 0000000000000001 0000000000000000
0000000000000000
[ 756.730673] Call Trace:
[ 756.731862] [<ffffffffb40760a7>] ? __do_softirq+0xf7/0x1f0
[ 756.733053] [<ffffffffb40761b9>] ? run_ksoftirqd+0x19/0x40
[ 756.734251] [<ffffffffb40909b0>] ? smpboot_thread_fn+0x170/0x250
[ 756.735439] [<ffffffffb4090840>] ? sort_range+0x20/0x20
[ 756.736606] [<ffffffffb408dc38>] ? kthread+0xc8/0xe0
[ 756.737785] [<ffffffffb408db70>] ? kthread_worker_fn+0x180/0x180
[ 756.738957] [<ffffffffb4558912>] ? ret_from_fork+0x42/0x70
[ 756.740128] [<ffffffffb408db70>] ? kthread_worker_fn+0x180/0x180
[ 758.763590] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0xd0000 action
0x6 frozen
[ 758.764837] ata6: SError: { PHYRdyChg CommWake 10B8B }
[ 758.766062] ata6.00: failed command: FLUSH CACHE EXT
[ 758.767304] ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[ 758.769824] ata6.00: status: { DRDY }
[ 758.771080] ata6: hard resetting link
[ 759.262751] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 764.254450] ata6.00: qc timeout (cmd 0xec)
[ 764.256699] ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 764.258946] ata6.00: revalidation failed (errno=-5)
[ 764.261185] ata6: hard resetting link
[ 764.753632] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)


Does this shed a light on what is happening here?
Hardware appears fine with 4.1.8. Disks sync OK, no defects seen.


Kind regards,
Udo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/