Re: [PATCH] coredump/fcntl: Add FD_CLOBCOR flag to close fd before dumping core
From: Al Viro
Date: Thu Jun 18 2026 - 00:31:35 EST
On Thu, Jun 18, 2026 at 11:07:00AM +0800, Xin Zhao wrote:
> A coredump typically takes some time to complete. If we happen to hold a
> write lock with flock just before triggering the coredump, that write lock
> will not be released during the entire coredump process. As a result,
> other processes attempting to acquire the same write lock may experience
> significant delays.
>
> To address this, we introduce the F_[GET|SET]FD_EX fcntl operation and the
> FD_CLOBCOR flag, allowing coredump_wait() to release any file descriptors
> marked with FD_CLOBCOR. We can also assign the FD_CLOBCOR flag to specific
> shared memory segments, preventing the coredump from including shared
> memory that we are not interested in, thereby reducing both the coredump
> duration and the size of the core file.
>
> We actually considered using signals that generate coredumps to perform
> the actions we wanted in user space. However, since other threads within
> the process are not frozen when handling these signals, indiscriminately
> closing an fd can lead to concurrency issues. For example, if the thread
> that triggered the coredump closes the fd in the signal handler while
> other threads are using the resources associated with that fd, it could
> cause secondary corruption of the coredump state.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@xxxxxxx>
No. Leaving aside the unasked-for overhead for every process on every system,
whether they are interested in this "feature" or not, this
> +static struct fdtable *close_files_before_core(struct files_struct *files)
> +{
> + /*
> + * It is safe to dereference the fd table without RCU or
> + * ->file_lock because this is the last reference to the
> + * files structure.
> + */
> + struct fdtable *fdt = rcu_dereference_raw(files->fdt);
> + unsigned int i, j = 0;
> +
> + for (;;) {
> + unsigned long set;
> +
> + i = j * BITS_PER_LONG;
> + if (i >= fdt->max_fds)
> + break;
> + set = fdt->open_fds[j++];
> + while (set) {
> + if (set & 1 && close_before_core(i, files)) {
> + struct file *file = fdt->fd[i];
> +
> + if (file) {
> + filp_close(file, files);
> + cond_resched();
> + }
> + }
> + i++;
> + set >>= 1;
> + }
> + }
is just plain wrong. You are leaving references in that descriptor table,
whether you've closed them or not. It *can't* be right - no matter what
you do after having called that, you will either leak file references
for ones that were not closed or eat double-free for ones that were.
Have you actually tested that patch?
Note that above is _not_ "fix that thing and I'll have no objections";
I think the benefits of that API are nowhere near worth inflicting the
cost on everyone.