Re: [PATCH v5 1/5] sysfs: Add attribute info for /sys/devices/.../coredump_disabled

From: Greg Kroah-Hartman
Date: Fri Aug 12 2022 - 02:10:07 EST


On Thu, Aug 11, 2022 at 04:21:54PM -0700, Manish Mandlik wrote:
> On Wed, Aug 10, 2022 at 9:21 AM Greg Kroah-Hartman <
> gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Wed, Aug 10, 2022 at 06:03:37PM +0200, Johannes Berg wrote:
> > > On Wed, 2022-08-10 at 09:00 -0700, Manish Mandlik wrote:
> > > > This patch adds the specification for
> > /sys/devices/.../coredump_disabled
> > > > attribute which allows the userspace to enable/disable devcoredump for
> > a
> > > > particular device and drivers can use it to enable/disable
> > functionality
> > > > accordingly. It is available when the CONFIG_DEV_COREDUMP is enabled
> > and
> > > > driver has implemented the .coredump() callback.
> > > >
> > >
> > > It would be nice to say _why_? What problem does this solve? You could
> > > just create the dump and discard it, instead, for example?
> >
> > Agreed, I do not understand the need for this at all.
> >
>
> The existing /sys/class/devcoredump/disabled (devcd) switch has two
> limitations - it disables dev_coredump for everyone who's using it;

Which is good and is the design of the thing.

> and
> drivers don't have visibility if devcd is disabled or not, so, the
> dev_coredump API simply lets drivers collect the coredump from a device but
> then later discards it if devcd is disabled.

Why would a driver care?

> Now that there are more subsystems using the base dev_coredump API, having
> a granular control will make it easier to selectively disable dev_coredump
> only for a particular device. For ChromeOS, this is useful to allow drivers
> to develop coredump functionality and deploy it without affecting other
> drivers with stable devcoredump implementations (example, we've had some
> devcoredumps that take 12s to run and we only want to enable it on test
> builds because it has lots of PII). The drivers can use this flag to
> refrain from collecting or triggering coredump when undesirable.

This feels odd. You have various out-of-tree drivers that take too long
when they crash to make a dump and it causes some unknown issue
elsewhere?

I don't really understand, sorry.

If you need something for development of a system, that's one thing, but
this feels like you are adding fine-grained tweaks that no one in a real
system would ever use.

What is broken with the current system of "on/off" that does not work
for you now? Why would a normal user only want to turn this on for one
driver and not another?

thanks,

greg k-h