Yeah, at the low end, it may make sense to do the 512B write via DIO. But
OTOH sync'ing many redo log FS blocks at once at the high end can be more
efficient.
From what I have heard, this was attempted before (using DIO) by some
vendor, but did not come to much.
So it seems that we are stuck with this redo log limitation.
Let me know if you have any other ideas to avoid large atomic writes...
From the description it sounds like the redo log consists of 512b blocks
that describe small changes to the 16k table file pages. If they're
issuing 16k atomic writes to get each of those 512b redo log records to
disk it's no wonder that cranks up the overhead substantially.
Also,
replaying those tiny updates through the pagecache beats issuing a bunch
of tiny nonlocalized writes.
For the first case I don't know why they need atomic writes -- 512b redo
log records can't be torn because they're single-sector writes. The
second case might be better done with exchange-range.