Re: [PATCH V2 2/2] fs: print a message when freezing/unfreezing filesystems

From: Jan Kara
Date: Mon May 19 2014 - 05:43:26 EST


On Fri 16-05-14 10:11:56, Dave Chinner wrote:
> On Fri, May 16, 2014 at 01:19:09AM +0200, Mateusz Guzik wrote:
> > On Fri, May 16, 2014 at 08:51:41AM +1000, Dave Chinner wrote:
> > > On Fri, May 16, 2014 at 12:34:40AM +0200, Mateusz Guzik wrote:
> > > > On Fri, May 16, 2014 at 08:21:35AM +1000, Dave Chinner wrote:
> > > > > > IOW, a new column in mountinfo. For frozen filesystems it would contain
> > > > > > 'frozen_by=[%s]:[%d]' (escaped comm, pid).
> > > > >
> > > > > I really don't see that the process that froze the filesystem is
> > > > > particularly useful - it many cases that process is long gone (e.g.
> > > > > fsfreeze is being used to allow a HW array to take a snapshot). Just
> > > > > the fact it is in the process of freezing (if stuck, stack trace in
> > > > > sysrq-w should be present) or frozen (freezing process may be long
> > > > > gone, and is mostly irrelevant because you're now tracking down why
> > > > > a thaw hasn't happened)...
> > > >
> > > > There are deamons which perform freezing and unfreezing on their own.
> > > > Thus storing the name along with pid helps to determine whether someone
> > > > went behind such daemon's back, or maybe it's the daemon which "forgot" to
> > > > unfreeze after all.
> > >
> > > Such a daemon should be logging the fact that it's freezing and
> > > thawing the filesystem. The kernel is not the place to track what
> > > buggy userspace applications are doing wrong.
> > >
> >
> > Except there is no log entry if /var got frozen (and this is not an
> > imaginary example).
>
> Freezing the filesystem that the freezing daemon logs to is, well, a
> major application architecture fail. Sorry, catering for the lowest
> common denominator (i.e. stupidity) is not an valid argument for
> adding stuff to the kernel....
Sure it's not a good architecture but it happens either because of a bug
or a wrong architecture. So you need to debug it and traces from sysrq-w
don't tell you who froze the filesystem. Currently you have to use
tracepoints or similar stuff to find that out (e.g. in one case I was
debugging it was rpm running a post-install script that froze the fs,
believe me that was really unexpected :)). But tracepoints aren't useful
after the fact so sometimes it would be useful to be able to find out
after the fact who froze the fs (PID and command name to help with
situations when the process isn't running anymore). Since this is mostly
debug stuff I'd be OK with dumping this information on sysrq request or as
Ted suggested from some fs-freeze hang check timer... Hmm?

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/