Re: [RFC PATCH] fs/coredump: Enable dynamic configuration of max file note size

From: Allen
Date: Mon Apr 29 2024 - 18:32:56 EST


On Mon, Apr 29, 2024 at 11:38 AM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
> On Mon, Apr 29, 2024 at 05:21:28PM +0000, Allen Pais wrote:
> > Introduce the capability to dynamically configure the maximum file
> > note size for ELF core dumps via sysctl. This enhancement removes
> > the previous static limit of 4MB, allowing system administrators to
> > adjust the size based on system-specific requirements or constraints.
> >
> > - Remove hardcoded `MAX_FILE_NOTE_SIZE` from `fs/binfmt_elf.c`.
> > - Define `max_file_note_size` in `fs/coredump.c` with an initial value set to 4MB.
> > - Declare `max_file_note_size` as an external variable in `include/linux/coredump.h`.
> > - Add a new sysctl entry in `kernel/sysctl.c` to manage this setting at runtime.
> >
> > $ sysctl -a | grep max_file_note_size
> > kernel.max_file_note_size = 4194304
> >
> > $ sysctl -n kernel.max_file_note_size
> > 4194304
> >
> > $echo 519304 > /proc/sys/kernel/max_file_note_size
> >
> > $sysctl -n kernel.max_file_note_size
> > 519304
>
> This doesn't highlight anything about *why*. So in practice you must've
> hit a use case where ELF notes are huge, can you give an example of
> that? The commit should also describe that this is only used in the path
> of a coredump on ELF binaries via elf_core_dump().
>

Yes, I should have captured it. We have observed that during a crash
when there are more than 65k mmaps in memory, the existing fixed limit on the
size of the ELF notes section becomes a bottleneck. The notes section quickly
reaches its capacity, leading to incomplete memory segment information in the
resulting coredump. This truncation compromises the utility of the coredumps,
as crucial information about the memory state at the time of the crash
might be omitted.

I will add the above to the commit message. Hope that addresses your concern.

> More below.
>
> > Signed-off-by: Vijay Nag <nagvijay@xxxxxxxxxxxxx>
> > Signed-off-by: Allen Pais <apais@xxxxxxxxxxxxxxxxxxx>
> > ---
> > fs/binfmt_elf.c | 3 +--
> > fs/coredump.c | 3 +++
> > include/linux/coredump.h | 1 +
> > kernel/sysctl.c | 8 ++++++++
> > 4 files changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index 5397b552fbeb..5fc7baa9ebf2 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -1564,7 +1564,6 @@ static void fill_siginfo_note(struct memelfnote *note, user_siginfo_t *csigdata,
> > fill_note(note, "CORE", NT_SIGINFO, sizeof(*csigdata), csigdata);
> > }
> >
> > -#define MAX_FILE_NOTE_SIZE (4*1024*1024)
> > /*
> > * Format of NT_FILE note:
> > *
> > @@ -1592,7 +1591,7 @@ static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm
> >
> > names_ofs = (2 + 3 * count) * sizeof(data[0]);
> > alloc:
> > - if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
> > + if (size >= max_file_note_size) /* paranoia check */
> > return -EINVAL;
> > size = round_up(size, PAGE_SIZE);
> > /*
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index be6403b4b14b..a83c6cc893fc 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -56,10 +56,13 @@
> > static bool dump_vma_snapshot(struct coredump_params *cprm);
> > static void free_vma_snapshot(struct coredump_params *cprm);
> >
> > +#define MAX_FILE_NOTE_SIZE (4*1024*1024)
> > +
> > static int core_uses_pid;
> > static unsigned int core_pipe_limit;
> > static char core_pattern[CORENAME_MAX_SIZE] = "core";
> > static int core_name_size = CORENAME_MAX_SIZE;
> > +unsigned int max_file_note_size = MAX_FILE_NOTE_SIZE;
> >
> > struct core_name {
> > char *corename;
> > diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> > index d3eba4360150..e1ae7ab33d76 100644
> > --- a/include/linux/coredump.h
> > +++ b/include/linux/coredump.h
> > @@ -46,6 +46,7 @@ static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
> > #endif
> >
> > #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL)
> > +extern unsigned int max_file_note_size;
> > extern void validate_coredump_safety(void);
> > #else
> > static inline void validate_coredump_safety(void) {}
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index 81cc974913bb..80cdc37f2fa2 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -63,6 +63,7 @@
> > #include <linux/mount.h>
> > #include <linux/userfaultfd_k.h>
> > #include <linux/pid.h>
> > +#include <linux/coredump.h>
> >
> > #include "../lib/kstrtox.h"
> >
> > @@ -1623,6 +1624,13 @@ static struct ctl_table kern_table[] = {
> > .mode = 0644,
> > .proc_handler = proc_dointvec,
> > },
> > + {
> > + .procname = "max_file_note_size",
> > + .data = &max_file_note_size,
> > + .maxlen = sizeof(unsigned int),
> > + .mode = 0644,
> > + .proc_handler = proc_dointvec,
> > + },
> > #ifdef CONFIG_PROC_SYSCTL
>
> No, please move this to coredump_sysctls in fs/coredump.c. And there is
> no point in supporting int, this is unisgned int right? So use the right
> proc handler for it.
>

Will address it in v2.

> If we're gonna do this, it makes sense to document the ELF note binary
> limiations. Then, consider a defense too, what if a specially crafted
> binary with a huge elf note are core dumped many times, what then?
> Lifting to 4 MiB puts in a situation where abuse can lead to many silly
> insane kvmalloc()s. Is that what we want? Why?
>
You raise a good point. I need to see how we can safely handle this case.

Thanks,
Allen


> Luis



--
- Allen