Re: [PATCH] coredump/fcntl: Add FD_CLOBCOR flag to close fd before dumping core
From: Xin Zhao
Date: Thu Jun 18 2026 - 02:49:46 EST
On Thu, 18 Jun 2026 00:29:57 -0500 "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> wrote:
> > A coredump typically takes some time to complete. If we happen to hold a
> > write lock with flock just before triggering the coredump, that write lock
> > will not be released during the entire coredump process. As a result,
> > other processes attempting to acquire the same write lock may experience
> > significant delays.
>
> You are talking about giant processes writing to slow backing store?
>
> I suspect you would be better off quickly writing the coredump to a pipe,
> and then writing it to disk.
>
> Unless your machine is badly balanced that should take perhaps a second.
>
> That said I don't see why you need elaborate machinery to do something
> about these file descriptors. Unless I am mistaken no file descriptors
> are placed into a coredump. In which case it should be possible to just
> call exit_files early.
Thank you for your suggestion. Are you suggesting modifying the coredump
path to a location like /tmp and then copying the files from /tmp to the
final storage location? That would indeed be faster, but our tasks
sometimes consume a considerable amount of memory, and the dump time might
still be significant even when using a pipe. For our higher-level code,
even 100 ms might be a bit too long. Additionally, while directly calling
exit_files() in coredump_wait can cover most scenarios, if the
coredump_filter includes (bit 3) file-backed shared memory, it may not be
appropriate to release all files.
I actually prefer to dynamically create the bitmap for determining
close_before_core on demand. If the task hasn't set this attribute, then
we wouldn't create the bitmap, which would also save time on memory
allocation when this feature is not used.
> > To address this, we introduce the F_[GET|SET]FD_EX fcntl operation and the
> > FD_CLOBCOR flag, allowing coredump_wait() to release any file descriptors
> > marked with FD_CLOBCOR. We can also assign the FD_CLOBCOR flag to specific
> > shared memory segments, preventing the coredump from including shared
> > memory that we are not interested in, thereby reducing both the coredump
> > duration and the size of the core file.
>
> Please look at vma_dump_size. There are plenty of ways already to
> skip dumping a memory area. Using file backed shared memory,
> and madvise(MADV_DONTDUMP) are two easy ones that already exist.
>
> My point is that there are cleaner ways to solve your problem than
> the solutions you have proposed.
This requirement can indeed be addressed with the MADV_DONTDUMP you
mentioned, thank you for your suggestion. But what troubles me is the
scenario mentioned above.
Thanks
Xin Zhao