Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good3.0.9)

From: Stephen Hemminger
Date: Fri Jan 20 2012 - 11:11:07 EST


On Fri, 20 Jan 2012 09:24:38 -0500
Michael Breuer <mbreuer@xxxxxxxxxx> wrote:

> On 1/16/2012 11:39 AM, Michael Breuer wrote:
> > Synopsis:
> >
> > Receiving DMAR and other errors after approximately three days of
> > uptime. The symptoms exactly match errors seen and then fixed around
> > 2.6.32.4.
> >
> > While the system remains unaffected for too long to do a bisect, I was
> > able to confirm that the problem exists in the 3.1 stable branch (I
> > jumped from 3.0 to 3.2 when 3.2. was released).
> >
> > For now I reverted to the sky2.c from 3.0.9 and am running the rest of
> > the kernel from 3.1.2, but won't be certain that this works until
> > later in the week.
> >
> > Note that 20 seconds prior to the log extract below were DHCP renewal
> > attempts on eth1, the issue below was on eth0. Not sure it's relevant,
> > however back in 2010 a preceding DHCP event did turn out to be
> > relevant to the manifestation of the bug.
> >
> > The 3.2.1-dirty I'm running is from git with a single local patch -
> > for sidewinder force-feedback support (shouldn't be relevant to the
> > sky2 issue).
> >
> > Log extract:
> >
> > Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault
> > status reg 2
> > Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error
> > interrupt status=0x80000000
> > Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request
> > device [06:00.0] fault addr fff78000
> > Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06]
> > PTE Read access is not set
> > Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI
> > hardware error (0x2010)
> > Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to
> > 10.240.184.29 port 67
> > Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here
> > ]------------
> > Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at
> > net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
> > Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System
> > Product Name
> > Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0
> > (sky2): transmit queue 0 timed out
> > Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp
> > cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns
> > nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables
> > iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun
> > bridge stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp
> > nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP
> > w83627ehf xt_mark xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc
> > zlib_deflate libcrc32c snd_hda_codec_analog snd_ens1371 gameport
> > snd_hda_intel snd_rawmidi snd_ac97_codec snd_hda_codec snd_hwdep
> > ac97_bus snd_seq snd_seq_device snd_pcm gspca_spca505 snd_timer
> > gspca_main snd videodev media soundcore i2c_i801 iTCO_wdt microcode
> > v4l2_compat_ioctl32 snd_page_alloc i7core_edac sky2 edac_core pcspkr
> > iTCO_vendor_support virtio_net virtio virtio_ring kvm_intel kvm uinput
> > ipv6 raid456 async_raid6_recov async_pq raid6_pq async_xor
> > firewire_ohci firewire_core pata_acpi ata_generic xor async_memcpy
> > async_tx crc_itu_t pata_marvell nouveau ttm d
> > Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core
> > mxm_wmi video [last unloaded: nf_conntrack_broadcast]
> > Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0
> > Tainted: G W 3.2.1-dirty #1
> > Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
> > Jan 16 05:50:01 mail kernel: [198246.288491] <IRQ>
> > [<ffffffff81050a4f>] warn_slowpath_common+0x7f/0xc0
> > Jan 16 05:50:01 mail kernel: [198246.288501] [<ffffffff8101f0bd>] ?
> > lapic_next_event+0x1d/0x30
> > Jan 16 05:50:01 mail kernel: [198246.288504] [<ffffffff81050b46>]
> > warn_slowpath_fmt+0x46/0x50
> > Jan 16 05:50:01 mail kernel: [198246.288509] [<ffffffff81009319>] ?
> > read_tsc+0x9/0x20
> > Jan 16 05:50:01 mail kernel: [198246.288513] [<ffffffff814a81e7>]
> > dev_watchdog+0x247/0x250
> > Jan 16 05:50:01 mail kernel: [198246.288518] [<ffffffff8105fbbb>]
> > run_timer_softirq+0x12b/0x3b0
> > Jan 16 05:50:01 mail kernel: [198246.288521] [<ffffffff814a7fa0>] ?
> > qdisc_reset+0x50/0x50
> > Jan 16 05:50:01 mail kernel: [198246.288525] [<ffffffff81057d18>]
> > __do_softirq+0xa8/0x210
> > Jan 16 05:50:01 mail kernel: [198246.288529] [<ffffffff8157496c>]
> > call_softirq+0x1c/0x30
> > Jan 16 05:50:01 mail kernel: [198246.288533] [<ffffffff810041e5>]
> > do_softirq+0x65/0xa0
> > Jan 16 05:50:01 mail kernel: [198246.288536] [<ffffffff810580fe>]
> > irq_exit+0x8e/0xb0
> > Jan 16 05:50:01 mail kernel: [198246.288539] [<ffffffff815750a3>]
> > do_IRQ+0x63/0xe0
> > Jan 16 05:50:01 mail kernel: [198246.288543] [<ffffffff8156ad2e>]
> > common_interrupt+0x6e/0x6e
> > Jan 16 05:50:01 mail kernel: [198246.288545] <EOI>
> > [<ffffffff81307b6d>] ? intel_idle+0xed/0x150
> > Jan 16 05:50:01 mail kernel: [198246.288551] [<ffffffff81307b4f>] ?
> > intel_idle+0xcf/0x150
> > Jan 16 05:50:01 mail kernel: [198246.288555] [<ffffffff8144d331>]
> > cpuidle_idle_call+0xc1/0x280
> > Jan 16 05:50:01 mail kernel: [198246.288559] [<ffffffff8100122a>]
> > cpu_idle+0xca/0x120
> > Jan 16 05:50:01 mail kernel: [198246.288563] [<ffffffff8154741e>]
> > rest_init+0x72/0x74
> > Jan 16 05:50:01 mail kernel: [198246.288568] [<ffffffff81b6abdd>]
> > start_kernel+0x3b5/0x3c0
> > Jan 16 05:50:01 mail kernel: [198246.288572] [<ffffffff81b6a32b>]
> > x86_64_start_reservations+0x132/0x136
> > Jan 16 05:50:01 mail kernel: [198246.288576] [<ffffffff81b6a140>] ?
> > early_idt_handlers+0x140/0x140
> > Jan 16 05:50:01 mail kernel: [198246.288580] [<ffffffff81b6a431>]
> > x86_64_start_kernel+0x102/0x111
> > Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace
> > bb26011d21a2b1d7 ]---
> > Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0:
> > tx timeout
> > Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0:
> > transmit ring 115 .. 10 report=115 done=115
> >
> >
> >
> FYI - I've been up for four days now without issues running on 3.2.1 +
> sky2.c from 3.0.9. Looks like the issue is in fact in one of the
> modifications made in sky2.c between those two releases.

Since only you seem to be able to reproduce it, most likely the
bisect burden will be on you. If you know it is only one file,
then bisecting that file is fairly quick.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/