We have two database servers which freeze up during heavy IO load. The
machines themselves are responsible, but the mysqld processes are forever
locked, unkillable with even kill -9. I can't restart with MySQL without
rebooting the machines.
I can reasonable rule out hardware, since this is happening in the
same way on two identical machines.
I'd like to know how I can debug this, to file a proper bug report.
The hardware/software stack is:
- Dual Opteron 246, SMP kernel, w/ NUMA
- 9 GB of memory (4GB in one zone, 5GB in the other)
- MySQL, running mostly InnoDB, but some MyISAM
- MegaRAID raid-10
- device mapper
- XFS (used as both O_DIRECT from InnoDB and regularly from MyISAM)
At this point I'm going to try changing different variables on
different machines in order to try and isolate it, but it's a slow
process.
- on raw partions, instead of device mapper
- ext3 instead of XFS
- not using O_DIRECT