On 2015-05-07 21:01, Sage Weil wrote:
On Thu, 7 May 2015, Zach Brown wrote:
On Thu, May 07, 2015 at 10:26:17AM +1000, Dave Chinner wrote:
On Wed, May 06, 2015 at 03:00:12PM -0700, Zach Brown wrote:
The criteria for using O_NOMTIME is the same as for using O_NOATIME:
owning the file or having the CAP_FOWNER capability. If we're not
comfortable allowing owners to prevent mtime/ctime updates then we
should add a tunable to allow O_NOMTIME. Maybe a mount option?

I dislike "turn off safety for performance" options because Joe
SpeedRacer will always select performance over safety.

Well, for ceph there's no safety concern. They never use cmtime in
these files.

So are you suggesting not implementing this and making them rework their
IO paths to avoid the fs maintaining mtime so that we don't give Joe
Speedracer more rope? Or are we talking about adding some speed bumps
that ceph can flip on that might give Joe Speedracer pause?

I think this is the fundamental question: who do we give the ammunition
to, the user or app writer, or the sysadmin?

One might argue that we gave the user a similar power with O_NOATIME (the
power to break applications that assume atime is accurate). Here we give
developers/users the power to not update mtime and suffer the consequences
(like, obviously, breaking mtime-based backups). It should be pretty
obvious to anyone using the flag what the consequences are.
The difference is that the only widely used program that uses atime for anything is Mutt (and many people who don't use Mutt just disable updating it altogether to improve performance), whereas mtime is used at the very least by many backup tools, and pretty much all NFSv{3,2} clients, as well as a number of other pieces of software.

Note that we can suffer similar lapses in mtime with fdatasync followed by
a system crash. And as Andy points out it's semi-broken for writable
mmap. The crash case is obviously a slightly different thing, but the
idea that mtime can't always be trusted certainly isn't crazy talk.

Or, we can be conservative and require a mount option so that the admin
has to explicitly allow behavior that might break some existing
assumptions about mtime/ctime ('-o user_noatime' I guess?).
Personally, I agree that there should be a mount option. We should make sure to put a big fat warning about it in the manpage however, irrespective of how it is controlled.

I'm happy either way, so long as in the end an unprivileged ceph daemon
avoids the useless work. In our case we always own the entire mount/disk,
so a mount option is just fine.
