Hi,Since you are looking for comments, I'll mention a loop-related behavior I've been seeing and see if it gets comments or is useful, since it can be used to tickle bad behavior on demand.
loop.c currently uses the page cache interface to do IO to file backed
devices. This works reasonably well for simple things, like mapping an
iso9660 file for direct mount and other read-only workloads. Writing is
somewhat problematic, as anyone who has really used this feature can
attest to - it tends to confuse the vm (hello kswapd) since it break
dirty accounting and behaves very erratically on writeout. Did I mention
that it's pretty slow as well, for both reads and writes?
It also behaves differently than a real drive. For writes, completions
are done once they hit page cache. Since loop queues bio's async and
hands them off to a thread, you can have a huge backlog of stuff to do.
It's hard to attempt to guarentee data safety for file systems on top of
loop without making it even slower than it currently is.
Back when loop was only used for iso9660 mounting and other simple
things, this didn't matter. Now it's often used in xen (and others)
setups where we do care about performance AND writing. So the below is a
attempt at speeding up loop and making it behave like a real device.
It's a somewhat quick hack and is still missing one piece to be
complete, but I'll throw it out there for people to play with and
comment on.
So how does it work? Instead of punting IO to a thread and passing it
through the page cache, we instead attempt to send the IO directly to the
filesystem block that it maps to. loop maintains a prio tree of known
extents in the file (populated lazily on demand, as needed). Advantages
of this approach:
- It's fast, loop will basically work at device speed.
- It's fast, loop it doesn't put a huge amount of system load on the
system when busy. When I did comparison tests on my notebook with an
external drive, running a simple tiobench on the current in-kernel
loop with a sparse file backing rendered the notebook basically
unusable while the test was ongoing. The remapper version had no more
impact than it did when used directly on the external drive.
- It behaves like a real block device.
- It's easy to support IO barriers, which is needed to ensure safety
especially in virtualized setups.
Disadvantages:
- The file block mappings must not change while loop is using the file.
This means that we have to ensure exclusive access to the file and
this is the bit that is currently missing in the implementation. It
would be nice if we could just do this via open(), ideas welcome...
- It'll tie down a bit of memory for the prio tree. This is GREATLY
offset by the reduced page cache foot print though.
- It cannot be used with the loop encryption stuff. dm-crypt should be
used instead, on top of loop (which, I think, is even the recommended
way to do this today, so not a big deal).