Re: [PATCH 1/2] devcoredump: Remove devcoredump device if failing device is gone

From: Rodrigo Vivi
Date: Tue Jan 30 2024 - 10:50:07 EST


On Tue, Jan 30, 2024 at 04:19:18PM +0100, Johannes Berg wrote:
> On Tue, 2024-01-30 at 10:16 -0500, Rodrigo Vivi wrote:
> > >
> > > But I'd rather not, it
> > > feels weird to have a need for it.
> >
> > We could change or CI and instruct our devs to always write
> > something to 'data' to ensure that devcoredump is deleted
> > before we can reload our module. Maybe that's the right
> > approach indeed, although I would really prefer to have
> > a direct way.
>
> That's not really what I meant :-) I think we can agree that it's wrong
> for the kernel to be _able_ to run into some kind of use-after-free if
> userspace isn't doing the right thing here!
>
> What I meant though is: it's weird for 'data' to actually depend on the
> struct device being still around, no? Whatever you want 'data' to be,
> couldn't you arrange it so that it's valid as long as the module isn't
> removed, so that the 'data' pointer literally encapsulates the needed
> data, doesn't depend on anything else, and the method you pass is more
> like a 'format' method.

I'm sorry for not being clear here. I totally agree with you.

I will make changes to our driver to make the 'data' a standalone memory
that devcoredump will free. this ensures no uaf and no null deref.
data could be read even after unbinding the driver.

What I meant to userspace 'writing to 'data'' was to ensure that
on our CI we run something like

if /sys/.../device/devcd<n> exists, then
echo 1 > /sys/.../device/devcd<n>/data
before attempting the rmmod <driver>

our rmmod cannot get stuck or our CI is blocked, but then ensuring
the devcd is gone with module_put happening is the only current way
of not blocking the rmmod.

>
> johannes