Re: [RESEND] file operations: release can race with read/write?

From: Andrew Morton
Date: Wed Oct 17 2007 - 15:17:34 EST


On Wed, 17 Oct 2007 11:30:44 -0700 Roland Dreier <rdreier@xxxxxxxxx> wrote:

> [Resending, directly to a couple likely suspects this time, in the
> hope of getting a reply... thanks]
>
> I have a question about the synchronization of file_operations: is it
> intended that the .release method of a file can be called while a
> .read or .write method is still running for that file?
>
> The reason I ask is that I've seen a crash in practice that appears to
> be caused by this race, and I'm wondering whether the correct fix is
> to add synchronization between .read/.write and .release to the
> driver's methods (this is a character device), or whether the core
> kernel is supposed to provide this synchronization.
>
> I've written a quick test case that seems to show this happen in
> practice, and reading the code in fs/open.c and fs/read_write.c I see
> no reason why this race can't happen: sys_read() and sys_write() just
> do fget_light(), which will not increment the file's reference count
> on the fast path, so a racing sys_close() from another thread could do
> the final fput() that ends up calling the file's .release method
> before the read or write has finished.

After pthread_create()'s clone() you should have files->count==2, so
fget_light() will do the full atomic_inc_not_zero() thing?

> I think the actual file data structure is OK, because it is freed with
> RCU, so it won't be freed too soon. But a .release method may free
> other driver-internal state that is still in use because of this race.
>
> It seems that we wouldn't want to give up the fget_light()
> optimization in read/write, so probably the right place to handle this
> is in the driver. But on the other hand, this is kind of a subtle
> booby trap that has been laid, and maybe there is a clever way to
> handle this. So I would appreciate guidance from smarter people.
>
> Thanks,
> Roland
>
>
> BTW, here's the test case -- it seems pretty conclusive to me, but
> maybe I'm all wet and misunderstanding things. There's a kernel
> module that just prints "Write raced with close?" if it sees the race,
> and a userspace driver that just spawns two threads, one that does
> open/close in a loop and one that does write in a loop. Running this
> on my machine, I see several race messages get printed.
>

That seems pretty conclusive. I haven't actually thought about this yet -
I just wanted to clarify that fget_light() thing.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/