Re: [PATCH 2/4] cxl/mem: Fix cdev_device_add() error handling

From: Dan Williams
Date: Tue Mar 30 2021 - 00:49:54 EST


On Mon, Mar 29, 2021 at 3:44 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Mon, Mar 29, 2021 at 02:03:37PM -0700, Dan Williams wrote:
>
> > Ugh, exactly why I was motivated to attempt to preclude this with new
> > core infrastructure that attempted to fix this centrally [1]. Remove
> > the possibility of "others" getting this wrong. However after my
> > initial idea bounced off Greg then I ended up shipping this bug in the
> > local rewrite. I think the debugfs api gets this right in terms of
> > centralizing the reference count management, and I want to see
> > something similar for common driver ioctl patterns.
>
> There is a lot of variety here, I'm not sure how much valuable common
> code there will be that could be lifted into the core.. srcu,
> refcount, rwsem, percpu_ref, etc are all common implementations with
> various properties.
>
> The easist implementation is to just block driver destruction with a
> refcount & completion pattern
>
> The hardest is to allow the underlying HW driver to be removed from
> the fops while the file remains open.
>
> Usually whatever scheme is used has to flow into some in-kernel API as
> well, so isolating it in cdev may no be entirely helpful.
>
> The easisted helper API would be to add an 'unregistration lock' to
> the struct device that blocks unregistration. A refcount & completion
> for instance. I've seen that open coded enough times.

I do agree there is too much variety to widely unify. At the same time
it is a common enough pattern for devices that allow removal before
final close, especially devices that support hot-removal disconnecting
is a better pattern than blocking unregisteration.

Just the small matter of time to see this through...