On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:Yes - sorry, correct - all three patches were in the last run. Previously, I've encountered the crash without these patches.
Kernel 220.127.116.11 (git) with the following patches applied:I guess, you meant the "sky2.c receive_copy" patch which you tested
af_packet.c (tpacket_snd version 3)
sky2 fix WARNING at lib/dma-debug.c check_sync
earlier, or at least you managed to crash DMAR with that patch
before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
Not sure I can do that. Note that based on the log messages, there were no errors/dropped packets involving dhcp. Moving the dhcp server off of the affected machine is not trivial. The dhcp correlation is based on logged messages preceding each crash. I cannot confirm that they're related, however it's really suspicious. If it helps, HP replaced my unmanaged switch with a managed one so I can see whether there were any switch events logged the next time I have a crash.Running with CONFIG_DMAR=n, system is stable.It would be nice to check now if it's re-creatable without the dhcp
Running with the exact same source but CONFIG_DMAR=y I get the
WARNING (see below) after about 36 hours of uptime (has varied from
about 24 to about 48):
Smolt profile: http://smolt.fedoraproject.org/show?uuid=pub_bb05c701-1e47-4b3c-9fab-54f520f39d79+
I'm also attaching dmesg.old (dmesg from the crash).
Subsequent to this the system watchdog reboots the system (it's hung).
Of interest: each and every time this has happened the system was
under heavy RX load (win7 backup to a cifs share hosted on this
server). Also, there is always a dhcp exchange of some sort
preceding the event.
It is possible that the event is re creatable without DMAR enabled,
but I have been unsuccessful in doing so.
exchange yet, or at least dhcp through the switch and the router,
because I suspect there might be something more than a simple drop
on the switch that affects sky2 stability.