Fair enough. That exaplains the behaviour. Would AIO help here? If we are able to enqueue next write before the first one is finished, it can start writing it immediately without waiting for a revolution.
If you could get them queued at the disk level, things that would need to be watched were if the disk can queue things up (and all controllers/drivers support it), and how many things the disk can queue up, and how large each of those things can be, if they aren't queued at the disk, there is the chance that the machine cannot get the data to the disk faster enough for that next sector.
Depending on your application you could always get a small fast solid state device (no seek or RPM issues), and use it to keep a journal that could be replayed on an unexpected crash...and then just use various syncs to force things to disk at various points.