On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:Nope - just got the new switch. Crash was old switch. That said, I don't think (based on the log messages) that the dhcpoffer packet drop was happening prior to the crash. I also can't fathom why a DHCPOFFER packet dropped after leaving the server would have any bearing on the issue.
On 1/22/2010 4:53 PM, Jarek Poplawski wrote:OK, thanks for testing - it's really very helpful, and supports
On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:Yes - sorry, correct - all three patches were in the last run.
Kernel 18.104.22.168 (git) with the following patches applied:I guess, you meant the "sky2.c receive_copy" patch which you tested
af_packet.c (tpacket_snd version 3)
sky2 fix WARNING at lib/dma-debug.c check_sync
earlier, or at least you managed to crash DMAR with that patch
before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
Previously, I've encountered the crash without these patches.
David's opinion that dmar is a different problem.
Not sure I can do that. Note that based on the log messages, thereDo you mean you got these crashes with the new switch too, and this
were no errors/dropped packets involving dhcp. Moving the dhcp
server off of the affected machine is not trivial. The dhcp
correlation is based on logged messages preceding each crash. I
cannot confirm that they're related, however it's really suspicious.
If it helps, HP replaced my unmanaged switch with a managed one so I
can see whether there were any switch events logged the next time I
have a crash.
At this point, it seems the following is required to trigger the crash:
1) Uptime of 24-36 hours
2) High RX load on server (cifs traffic is what I've triggered it with).
3) Normal DHCP traffic.
switch doesn't drop DHCP at all? (Otherwise, let's try this switch