[PATCH v3 0/3] exec: fix passing of file locks across execve in multithreaded processes

From: Jeff Layton
Date: Fri Sep 14 2018 - 06:53:15 EST


v2: fix displaced_files cleanup in __do_execve_file
v3: fix thread_count handling in unshare syscall
add new release_files_struct helper

The main difference from the earlier set is some cleanup and fixes for
the thread_count handling in the files_struct. The original looked a
bit more ugly without the new helper function, and didn't get the
refcounting right in the unshare case, which was discovered by the
kernel test robot seeing the new WARN pop up during one test.

At this point, I'm mainly interested in whether this is the correct
approach to fix this at all. It is along the lines of what Eric B.
suggested, but I'm not sure whether it may break other cases that I
haven't considered.

-------------------------8<----------------------

A few months ago, Dan reported that when you call execve in process that
is multithreaded, any traditional POSIX locks are silently dropped.

The problem is that we end up unsharing the files_struct from the
process very early during exec, when it looks like it's shared between
tasks. Eventually, when the other, non-exec'ing tasks are killed, we
tear down the old files_struct. That ends up tearing down the old
files_struct, which ends up looking like a close() was issues on each
open fd and that causes the locks to be dropped.

This patchset is a second stab at fixing this issue, this time following
the method suggested by Eric Biederman. The idea here is to move the
unshare_files() call after de_thread(), which helps ensure that we only
unshare the files_struct when it's truly shared between different
processes, and not just when the exec'ing process is multithreaded.

This seems to fix the originally reported problem (now known as xfstest
generic/484), and basic testing doesn't seem to show any issues.

During the original discussion though, Al had mentioned that this could
be problematic due to the fdtable being modifiable by other threads
(or even processes) during the binfmt probe. That may make this idea
non-viable.

I'm also not terribly thrilled with the way this sprinkles the
files_struct->file_lock all over the place. It may be possible to do
some of this with atomic ops if the basic approach turns out to be
reasonable.

Comments and suggestions welcome.

Jeff Layton (3):
exec: separate thread_count for files_struct
exec: delay clone(CLONE_FILES) if task associated with current
files_struct is exec'ing
exec: do unshare_files after de_thread

fs/coredump.c | 2 +-
fs/exec.c | 27 +++++++++++++++++++--------
fs/file.c | 19 +++++++++++++++++--
include/linux/binfmts.h | 1 +
include/linux/fdtable.h | 3 +++
kernel/fork.c | 25 ++++++++++++++++++++-----
6 files changed, 61 insertions(+), 16 deletions(-)

--
2.17.1