Re: [PATCH 2/4] cxl/mem: Fix cdev_device_add() error handling

From: Jason Gunthorpe
Date: Mon Mar 29 2021 - 18:45:29 EST


On Mon, Mar 29, 2021 at 02:03:37PM -0700, Dan Williams wrote:

> Ugh, exactly why I was motivated to attempt to preclude this with new
> core infrastructure that attempted to fix this centrally [1]. Remove
> the possibility of "others" getting this wrong. However after my
> initial idea bounced off Greg then I ended up shipping this bug in the
> local rewrite. I think the debugfs api gets this right in terms of
> centralizing the reference count management, and I want to see
> something similar for common driver ioctl patterns.

There is a lot of variety here, I'm not sure how much valuable common
code there will be that could be lifted into the core.. srcu,
refcount, rwsem, percpu_ref, etc are all common implementations with
various properties.

The easist implementation is to just block driver destruction with a
refcount & completion pattern

The hardest is to allow the underlying HW driver to be removed from
the fops while the file remains open.

Usually whatever scheme is used has to flow into some in-kernel API as
well, so isolating it in cdev may no be entirely helpful.

The easisted helper API would be to add an 'unregistration lock' to
the struct device that blocks unregistration. A refcount & completion
for instance. I've seen that open coded enough times.

Jason