Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit()

From: Jarek Poplawski
Date: Fri Jan 08 2010 - 02:45:55 EST


On Thu, Jan 07, 2010 at 06:11:34PM -0500, Michael Breuer wrote:
> Results:
> * no MMAP, mtu=1500, neither alternative patch loaded: adapter crashed:
> Jan 7 15:44:23 mail kernel: DRHD: handling fault status reg 2
> Jan 7 15:44:23 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
> fault addr fffb7bffe000
> Jan 7 15:44:23 mail kernel: DMAR:[fault reason 06] PTE Read access is
> not set
> Jan 7 15:44:23 mail kernel: sky2 0000:06:00.0: error interrupt
> status=0x80000000
> Jan 7 15:44:23 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
> Jan 7 15:44:24 mail smbd[6572]: [2010/01/07 15:44:24, 0]
> lib/util_sock.c:539(read_fd_with_timeout)
> Jan 7 15:44:24 mail smbd[6572]: [2010/01/07 15:44:24, 0]
> lib/util_sock.c:1491(get_peer_addr_internal)
> Jan 7 15:44:24 mail smbd[6572]: getpeername failed. Error was
> Transport endpoint is not connected
> Jan 7 15:44:24 mail smbd[6572]: read_fd_with_timeout: client 0.0.0.0
> read error = Connection timed out.
> Jan 7 15:44:44 mail kernel: ------------[ cut here ]------------
> Jan 7 15:44:44 mail kernel: WARNING: at net/sched/sch_generic.c:261
> dev_watchdog+0xf3/0x164()
> Jan 7 15:44:44 mail kernel: Hardware name: System Product Name
> Jan 7 15:44:44 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
> queue 0 timed out
> Jan 7 15:44:44 mail kernel: Modules linked in: ip6table_filter
> ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat nf_nat
> iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd lockd
> nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq sit
> tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp xt_DSCP xt_dscp
> xt_MARK nf_conntrack_ipv6 xt_multiport ipv6 dm_multipath kvm_intel kvm
> snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec
> snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device
> snd_pcm gspca_spca505 gspca_main firewire_ohci videodev v4l1_compat
> firewire_core pcspkr v4l2_compat_ioctl32 snd_timer iTCO_wdt i2c_i801
> crc_itu_t iTCO_vendor_support snd soundcore snd_page_alloc sky2 wmi
> asus_atk0110 hwmon fbcon tileblit font bitblit softcursor raid456
> async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
> raid1 ata_generic pata_acpi pata_marvell nouveau ttm drm_kms_helper drm
> agpgart fb i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfil
> Jan 7 15:44:44 mail kernel: lrect [last unloaded: microcode]
> Jan 7 15:44:44 mail kernel: Pid: 0, comm: swapper Tainted: G W

BTW, was there any other oops saved before this one?

...
> --- adapter dead after this --- rebooted.
> * no MMAP; alternative 1 patch, mtu=1500; no errors; sustained transfer
> rates about 25% lower than what I saw with mmap enabled...(before MMAP
> enabled crashed).

?? Read below...

> * no MMAP mtu=9000; ran ok at low transfer rates - when high rates
> kicked in, got the sky2 interrupt error & things went south:
> Jan 7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt
> status=0x40000008
> Jan 7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt
> status=0x40000008
> After this, remote connections broke and I rebooted... decided to rerun
> w/o MMAP again before going back to MMAP and trying those other sky2
> options...
> * Retest of no MMAP + Alternative 1 - just to confirm consistency.
> Worked - no errors. Only version so far that allows the win7 backup to
> complete.

??? Hmm... Alternative 1 or 2 doesn't even compile into when no MMAP,
so it definitely needs re-retesting ;-)

> * MMAP + NO DMAR + disable_msi=1... also works w/o errors... leaving
> this one running for a while - also completed a backup successfully.
> Fastest of the lot... about 3x faster than any other version, working or
> not.

Very interesting. It would be nice to give it a really long try, and
if still true, try MMAP + NO DMAR only.

>
> I'm leaving this one running for now. Not retesting jumbo for now. Be
> happy to help dig further.
>
> Tentative recommendations:
>
> 1) The af alternative patch seems rather necessary. First alternative
> seems to be working, I'd suggest that be submitted and backported to
> 2.6.32.
> 2) Steven's pskb_may_pull patch also ought to be included and backported.
> 3) Jumbo frame support for yukon2 should probably be disabled until/if
> fixed.
> 4) When possible I'll test dmar and disable_msi, and no dmar and no
> disable_msi. When I first hit issues, I was running without DMAR, but
> also without the above patches. I suppose the non-working permutations
> need to be either fixed or invalidated (or well documented).
> 5) It would be nice if someone with comparable hardware could reproduce
> these issues. FWIW, I can only recreate the crash running windows backup
> to a cifs share. Copying large files doesn't seem to do it. Could also
> be some other interaction going on here that perhaps others aren't
> running - would be happy to compare notes.
>
> Notes:
> This *could* be coincidental, but maybe not...
> With MMAP+NO DMAR + disable_msi there are far fewer ... actually almost
> no... bind error reports... and no bind format error messages. With
> NOMMAP and alternative one there are a few more bind error messages and
> one format error message during the several hours that version was up.
> All other configurations going back perhaps for two weeks have
> significantly more bind error reports - and all versions show increasing
> frequency of bind format errors (IPV6 only) in the roughly 10-15 minutes
> preceding the lockup/crash/interrupt error messages. There are none
> immediately preceding any crash, but perhaps there is some correlation
> between the network errors and bind ipv6 packets.

OK, for now let's make sure this MMAP + NO DMAR + disable_msi is
really really working.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/