Re: [Patch] vfs: increase scope of critical locked path in fget_light to avoid race

From: Al Viro
Date: Fri May 20 2005 - 16:30:46 EST


On Fri, May 20, 2005 at 11:25:50AM -0400, Neil Horman wrote:
> I don't have the complete race scenario, just a stack that suggests that
> files->fd was corrupted. This problem isn't recreatable at will (yet), so this
> is really based on a thought experiment more than anything else. The conditions
> that I was envisioning was a multithreaded application in which two threads
> modified the same file descriptor at the same time. Its my understanding, from
> the way I read the code that the ref count on a file_struct will still be one
> for a multithreaded application, and as such it would be possible, using the
> fget_light routine for one thread to be be preforming an operation on an
> descriptor in the fd array, while another thread preformed another operation

Incorrect. References to files_struct are held by task_struct. Kernel
stack is determined by task_struct. So your two threads would have
to share task_struct (i.e. be purely userland ones) and could not
run in the kernel at the same time for very obvious reasons.

The rules are simple:
* all access to files_struct is done from upper-half (i.e. is
process-synchronous).
* the only files_struct you can modify is *(current->files)
* each task_struct that has ->files pointing to given files_struct
contributes 1 to ->count of that files_struct. There might be other holders
of temporary references and they also contribute to ->count.
* all changes of task->files itself are process-synchronous. Only
two kinds of changes are possible:
1) current->files can be set to NULL. That drops a reference to
original files_struct.
2) current->files can be replaced with a pointer to a new copy of
previous files_struct. This operations drops a reference to old one and
sets the refcount on a copy to 1. It could either be done explicitly (when
unshare(2) gets merged into Linus' tree) or implicitly at the clone()/fork()
time. In the latter case that's done by parent to child before the child
gets a chance to run.
* at task creation time, child inherits ->files from parent; that
acquires a new reference to it. That might be followed by implicit unshare()
(see above). fork(2) always unshares ->files, clone(2) does that unless
CLONE_FILES had been passed to it in flags.

IOW, the only way for two tasks to have ->files pointing to the same object
is to have it unchanged all the way back to common ancestor. In particular,
if current->files->count is 1, we know that no other task has ->files pointing
to our files_struct and that will remain true until we call clone(). It does
*not* mean that current->files->count will remain 1; somebody might acquire
a temporary reference to our files_struct. However, we are guaranteed that
all such references will be used only for read-only access (that happens,
e.g., when somebody does ls /proc/<our_pid>/fd - they will grab a reference
to our ->files and go looking at the descriptor table; they are not allowed
to change it, though).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/