Re: [RFC v10][PATCH 08/13] Dump open file descriptors

From: Linus Torvalds
Date: Mon Dec 01 2008 - 16:02:48 EST




On Mon, 1 Dec 2008, Dave Hansen wrote:
>
> Why is this done in two steps? It first grabs a list of fd numbers
> which needs to be validated, then goes back and turns those into 'struct
> file's which it saves off. Is there a problem with doing that
> fd->'struct file' conversion under the files->file_lock?

Umm, why do we even worry about this?

Wouldn't it be much better to make sure that all other threads are
stopped before we snapshot, and if we cannot account for some thread (ie
there's some elevated count in the fs/files/mm structures that we cannot
see from the threads we've stopped), just refuse to dump.

There is no sane dump from a multi-threaded app that shares resources
without that kind of serialization _anyway_, so why even try?

In other words: any races in dumping are fundamental _bugs_ in the dumping
at a much higher level. There's absolutely no point in trying to make
something like "dump open fd's" be race-free, because if there are other
people that are actively accessing the 'files' structure concurrently, you
had a much more fundamental bug in the first place!

So do things more like the core-dumping does: make sure that all other
threads are quiescent first!

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/