Re: [PATCH] core_pattern: add CPU specifier

From: Oleksandr Natalenko
Date: Wed Sep 07 2022 - 02:15:30 EST


Hello.

On středa 7. září 2022 0:22:42 CEST Eric W. Biederman wrote:
> Oleksandr Natalenko <oleksandr@xxxxxxxxxx> writes:
>
> > Statistically, in a large deployment regular segfaults may indicate a CPU issue.
> >
> > Currently, it is not possible to find out what CPU the segfault happened on.
> > There are at least two attempts to improve segfault logging with this regard,
> > but they do not help in case the logs rotate.
> >
> > Hence, lets make sure it is possible to permanently record a CPU
> > the task ran on using a new core_pattern specifier.
>
> I am puzzled why make it part of the file name, and not part of the
> core file? Say an elf note?

This might be a good idea too, and one approach doesn't exclude the other one.

> The big advantage is that you could always capture the cpu and
> will not need to take special care configuring your system to
> capture that information.

The advantage of having CPU recorded in the file name is that in case of multiple cores one can summarise them with a simple ls+grep without invoking a fully-featured debugger to find out whether the segfaults happened on the same CPU.

Thanks.

> Eric
>
> > Suggested-by: Renaud Métrich <rmetrich@xxxxxxxxxx>
> > Signed-off-by: Oleksandr Natalenko <oleksandr@xxxxxxxxxx>
> > ---
> > Documentation/admin-guide/sysctl/kernel.rst | 1 +
> > fs/coredump.c | 5 +++++
> > include/linux/coredump.h | 1 +
> > 3 files changed, 7 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 835c8844bba48..b566fff04946b 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -169,6 +169,7 @@ core_pattern
> > %f executable filename
> > %E executable path
> > %c maximum size of core file by resource limit RLIMIT_CORE
> > + %C CPU the task ran on
> > %<OTHER> both are dropped
> > ======== ==========================================
> >
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index a8661874ac5b6..166d1f84a9b17 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -325,6 +325,10 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> > err = cn_printf(cn, "%lu",
> > rlimit(RLIMIT_CORE));
> > break;
> > + /* CPU the task ran on */
> > + case 'C':
> > + err = cn_printf(cn, "%d", cprm->cpu);
> > + break;
> > default:
> > break;
> > }
> > @@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> > */
> > .mm_flags = mm->flags,
> > .vma_meta = NULL,
> > + .cpu = raw_smp_processor_id(),
> > };
> >
> > audit_core_dumps(siginfo->si_signo);
> > diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> > index 08a1d3e7e46d0..191dcf5af6cb9 100644
> > --- a/include/linux/coredump.h
> > +++ b/include/linux/coredump.h
> > @@ -22,6 +22,7 @@ struct coredump_params {
> > struct file *file;
> > unsigned long limit;
> > unsigned long mm_flags;
> > + int cpu;
> > loff_t written;
> > loff_t pos;
> > loff_t to_skip;

--
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer