On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:Nope - just got the new switch. Crash was old switch. That said, I don't think (based on the log messages) that the dhcpoffer packet drop was happening prior to the crash. I also can't fathom why a DHCPOFFER packet dropped after leaving the server would have any bearing on the issue.
On 1/22/2010 4:53 PM, Jarek Poplawski wrote:OK, thanks for testing - it's really very helpful, and supports
On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:Yes - sorry, correct - all three patches were in the last run.
Kernel 2.6.32.4 (git) with the following patches applied:I guess, you meant the "sky2.c receive_copy" patch which you tested
af_packet.c (tpacket_snd version 3)
sky2.c pskb_may_pull
sky2 fix WARNING at lib/dma-debug.c check_sync
earlier, or at least you managed to crash DMAR with that patch
before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
right?
Previously, I've encountered the crash without these patches.
David's opinion that dmar is a different problem.
...
Not sure I can do that. Note that based on the log messages, thereDo you mean you got these crashes with the new switch too, and this
were no errors/dropped packets involving dhcp. Moving the dhcp
server off of the affected machine is not trivial. The dhcp
correlation is based on logged messages preceding each crash. I
cannot confirm that they're related, however it's really suspicious.
If it helps, HP replaced my unmanaged switch with a managed one so I
can see whether there were any switch events logged the next time I
have a crash.
At this point, it seems the following is required to trigger the crash:
1) Uptime of 24-36 hours
2) High RX load on server (cifs traffic is what I've triggered it with).
3) Normal DHCP traffic.
switch doesn't drop DHCP at all? (Otherwise, let's try this switch
first.)
Jarek P.