ioctl(BLKBSZSET), I guess? That currently limits to PAGE_SIZE, but IThe thing is that there's no requirement for an interface as complex asWe want to support untorn writes for bdev file operations - how can we set
the one you're proposing here. I've talked to a few database people
and all they want is to increase the untorn write boundary from "one
disc block" to one database block, typically 8kB or 16kB.
So they would be quite happy with a much simpler interface where they
set the inode block size at inode creation time,
the inode block size there? Currently it is based on logical block size.
think we can remove that limitation with the bs>PS patches.
If you're talking about "regular unaligned buffered writes", then thatand then all writes toWe did consider that. Won't that lead to the possibility of breaking
that inode were guaranteed to be untorn. This would also be simpler to
implement for buffered writes.
existing applications which want to do regular unaligned writes to these
files? We do know that mysql/innodb does have some "compressed" mode of
operation, which involves regular writes to the same file which wants untorn
writes.
won't break. If you cross a folio boundary, the result may be torn,
but if you're crossing a block boundary you expect that.
Furthermore, untorn writes in HW are expensive - for SCSI anyway. Do weDo untorn writes actually exist in SCSI? I was under the impression
always want these for such a file?
nobody had actually implemented them in SCSI hardware.
We saw untorn writes as not being a property of the file or even the inodeThe problem is that keeping track of that is expensive for buffered
itself, but rather an attribute of the specific IO being issued from the
userspace application.
writes. It's a model that only works for direct IO. Arguably we
could make it work for O_SYNC buffered IO, but that'll require some
surgery.