On Thu, Jan 07, 2010 at 06:11:34PM -0500, Michael Breuer wrote:Nope - just the one.
Results:BTW, was there any other oops saved before this one?
...
Jan 7 15:44:44 mail kernel: Pid: 0, comm: swapper Tainted: G W
...
I see your point. I'm pretty sure that run failed miserably. Perhaps something else is going on - some sort if intermittent thing that just got caught there... have one thought - see below.--- adapter dead after this --- rebooted.?? Read below...
* no MMAP; alternative 1 patch, mtu=1500; no errors; sustained transfer
rates about 25% lower than what I saw with mmap enabled...(before MMAP
enabled crashed).
* no MMAP mtu=9000; ran ok at low transfer rates - when high rates??? Hmm... Alternative 1 or 2 doesn't even compile into when no MMAP,
kicked in, got the sky2 interrupt error& things went south:
Jan 7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt
status=0x40000008
Jan 7 15:09:28 mail kernel: sky2 0000:06:00.0: error interrupt
status=0x40000008
After this, remote connections broke and I rebooted... decided to rerun
w/o MMAP again before going back to MMAP and trying those other sky2
options...
* Retest of no MMAP + Alternative 1 - just to confirm consistency.
Worked - no errors. Only version so far that allows the win7 backup to
complete.
so it definitely needs re-retesting ;-)
Still up - no kernel errors reported. There was a large dropped packet rate (RX) which seems to actually correlate with DNS format error messages (ipv6 only). I spent some time tracking those down. Interestingly, most pointed back to one netblock & one ISP (0qf.ru). I blocked that domain and the errors expectedly dropped - as did the RX dropped packet rate. Since booting this version yesterday eth0 shows 1752944 dropped packets. 1752939 of those happened before I blocked the domain about 8 hours ago. I have run load tests since as well.* MMAP + NO DMAR + disable_msi=1... also works w/o errors... leavingVery interesting. It would be nice to give it a really long try, and
this one running for a while - also completed a backup successfully.
Fastest of the lot... about 3x faster than any other version, working or
not.
if still true, try MMAP + NO DMAR only.
Still running :)I'm leaving this one running for now. Not retesting jumbo for now. BeOK, for now let's make sure this MMAP + NO DMAR + disable_msi is
happy to help dig further.
Tentative recommendations:
1) The af alternative patch seems rather necessary. First alternative
seems to be working, I'd suggest that be submitted and backported to
2.6.32.
2) Steven's pskb_may_pull patch also ought to be included and backported.
3) Jumbo frame support for yukon2 should probably be disabled until/if
fixed.
4) When possible I'll test dmar and disable_msi, and no dmar and no
disable_msi. When I first hit issues, I was running without DMAR, but
also without the above patches. I suppose the non-working permutations
need to be either fixed or invalidated (or well documented).
5) It would be nice if someone with comparable hardware could reproduce
these issues. FWIW, I can only recreate the crash running windows backup
to a cifs share. Copying large files doesn't seem to do it. Could also
be some other interaction going on here that perhaps others aren't
running - would be happy to compare notes.
Notes:
This *could* be coincidental, but maybe not...
With MMAP+NO DMAR + disable_msi there are far fewer ... actually almost
no... bind error reports... and no bind format error messages. With
NOMMAP and alternative one there are a few more bind error messages and
one format error message during the several hours that version was up.
All other configurations going back perhaps for two weeks have
significantly more bind error reports - and all versions show increasing
frequency of bind format errors (IPV6 only) in the roughly 10-15 minutes
preceding the lockup/crash/interrupt error messages. There are none
immediately preceding any crash, but perhaps there is some correlation
between the network errors and bind ipv6 packets.
really really working.
Thanks,
Jarek P.