Re: [PATCH 2/3] mwifiex: support sysfs initiated device coredump

From: Brian Norris
Date: Mon Feb 26 2018 - 17:06:23 EST


Hi,

On Fri, Feb 23, 2018 at 2:51 AM, Johannes Berg
<johannes@xxxxxxxxxxxxxxxx> wrote:
> On Fri, 2018-02-23 at 11:39 +0100, Arend van Spriel wrote:
>
> > > > Well, that depends on the eye of the beholder I guess. From user-space
> > > > perspective it is asynchronous regardless. A write access to the coredump
> > > > sysfs file eventually results in a uevent when the devcoredump entry is
> > > > created, ie. after driver has made a dev_coredump API call. Whether the
> > > > driver does that synchronously or asynchronously is irrelevant as far as
> > > > user-space is concerned.
> > >
> > > Is it really? The driver infrastructure seems to guarantee that the
> > > entirety of a driver's ->coredump() will complete before returning from
> > > the write. So it might be reasonable for some user to assume (based on
> > > implementation details, e.g., of brcmfmac) that the devcoredump will be
> > > ready by the time the write() syscall returns, absent documentation that
> > > says otherwise. But then, that's not how mwifiex works right now, so
> > > they might be surprised if they switch drivers.
>
> I can see how you might want to have that kind of behaviour, but you'd
> have to jump through some hoops to see if the coredump you saw is
> actually the right one - you probably want an asynchronous coredump
> "collector" and then wait for it to show up (with some reasonable
> timeout) on the actual filesystem, not on sysfs?
>
> Otherwise you have to trawl sysfs for the right coredump I guess, which
> too is possible.

It's not that I want that interface. It's that I want the *lack* of
such an interface to be guaranteed in the documentation. When the
questions like "where? when?" are not answered in the doc, users are
totally allowed to speculate ;) Perhaps the "where" can be deferred to
other documentation (which should probably exist someday), but the
"when" should be listed as "eventually; or not at all; listen for a
uevent."

> > > > You are right. Clearly I did not reach the end my learning curve here. I
> > > > assumed referring to the existing dev_coredump facility was sufficient, but
> > > > maybe it is worth a patch to be more explicit and mention the uevent
> > > > behavior. Also dev_coredump facility may be disabled upon which the trigger
> > > > will have no effect in sysfs. In the kernel the data passed by the driver is
> > > > simply freed by dev_coredump facility.
> > >
> > > Is there any other documentation for the coredump feature? I don't
> > > really see much.
> >
> > Any other than the code itself you mean? I am not sure. Maybe Johannes
> > knows.
>
> There isn't really, it originally was really simple, but then somebody
> (Kees perhaps?) requested a way to turn it off forever for security or
> privacy concerns and it became more complicated.

Then I don't think when adding a new sysfs ABI, we should be deferring
to "existing dev_coredump facility [documentation]" (which doesn't
exist). And just a few words about the user-facing interface would be
nice for the documentation. There previously wasn't any official way
to trigger a dump from userspace -- only from random debugfs files, I
think, or from unspecified device failures.

> > > static ssize_t coredump_store(struct device *dev, struct device_attribute *attr,
> > > const char *buf, size_t count)
> > > {
> > > device_lock(dev);
> > > if (dev->driver->coredump)
> > > dev->driver->coredump(dev);
> > > device_unlock(dev);
> > >
> > > return count;
> > > }
> > > static DEVICE_ATTR_WO(coredump);
> > >
> > > Is that a bug or a feature?
> >
> > Yeah. Let's call it a bug. Just not sure what to go for. Return the
> > error or change coredump callback to void return type.
>
> I'm not sure it matters all that much - the underlying devcoredump
> calls all have no return value (void), and given the above complexities
> with the ability to turn off devcoredumping entirely you cannot rely on
> this return value to tell you if a dump was created or not, at least
> not without much more infrastructure work.

Then perhaps it makes sense to remove the return code before you
create users of it.

Brian