Re: [PATCH v2 0/7] large atomic writes for xfs

From: John Garry
Date: Mon Dec 16 2024 - 03:41:14 EST

Next message: Baolin Wang: "Re: [PATCH V4] mm, compaction: don't use ALLOC_CMA in long term GUP flow"
Previous message: Kees Cook: "Re: [PATCH RESEND] wifi: iwlwifi: mvm: Fix __counted_by usage in cfg80211_wowlan_nd_*"
Next in thread: Christoph Hellwig: "Re: [PATCH v2 0/7] large atomic writes for xfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Yeah, at the low end, it may make sense to do the 512B write via DIO. But
OTOH sync'ing many redo log FS blocks at once at the high end can be more
efficient.

From what I have heard, this was attempted before (using DIO) by some
vendor, but did not come to much.

So it seems that we are stuck with this redo log limitation.

Let me know if you have any other ideas to avoid large atomic writes...

From the description it sounds like the redo log consists of 512b blocks
that describe small changes to the 16k table file pages. If they're
issuing 16k atomic writes to get each of those 512b redo log records to
disk it's no wonder that cranks up the overhead substantially.

They are not issuing the redo log atomically. They do 512B buffered writes and then periodically fsync.

Also,
replaying those tiny updates through the pagecache beats issuing a bunch
of tiny nonlocalized writes.

For the first case I don't know why they need atomic writes -- 512b redo
log records can't be torn because they're single-sector writes. The
second case might be better done with exchange-range.

As for exchange-range, that would very much pre-date any MySQL port. Furthermore, I can't imagine that exchange-range support is portable to other FSes, which is probably quite important. Anyway, they are not issuing the redo log atomically, so I don't know if mentioning exchange-range is relevant.

Regardless of what MySQL is specifically doing here, there are going to be other users/applications which want to keep a 4K FS blocksize and do larger atomic writes.

Thanks,
John

Next message: Baolin Wang: "Re: [PATCH V4] mm, compaction: don't use ALLOC_CMA in long term GUP flow"
Previous message: Kees Cook: "Re: [PATCH RESEND] wifi: iwlwifi: mvm: Fix __counted_by usage in cfg80211_wowlan_nd_*"
Next in thread: Christoph Hellwig: "Re: [PATCH v2 0/7] large atomic writes for xfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]