Re: [PATCH] drm: inhibit drm drivers register to uninitialized drm core
From: Daniel Vetter
Date: Tue Jul 11 2017 - 03:53:30 EST
On Mon, Jul 10, 2017 at 09:56:20PM +0200, Alexandru Moise wrote:
> On Mon, Jul 10, 2017 at 08:00:37PM +0200, Daniel Vetter wrote:
> > On Mon, Jul 10, 2017 at 9:14 AM, Alexandru Moise
> > <00moses.alexander00@xxxxxxxxx> wrote:
> > > On Mon, Jul 10, 2017 at 08:52:46AM +0200, Daniel Vetter wrote:
> > >> On Sat, Jul 08, 2017 at 11:43:52PM +0200, Alexandru Moise wrote:
> > >> > If the DRM core fails to init for whatever reason, ensure that
> > >> > no driver ever calls drm_dev_register().
> > >> >
> > >> > This is best done at drm_dev_init() as it covers drivers that call
> > >> > drm_dev_alloc() as well as drivers that prefer to embed struct
> > >> > drm_device into their own device struct and call drm_dev_init()
> > >> > themselves.
> > >> >
> > >> > In my case I had so many dynamic device majors used that the major
> > >> > number for DRM (226) was stolen, causing DRM core init to fail after
> > >> > failing to register a chrdev, and ultimately calling debugfs_remove()
> > >> > on drm_debugfs_root in drm_core_exit().
> > >> >
> > >> > After drm core failed to init, VGEM was still calling drm_dev_register(),
> > >> > ultimately leading to drm_debugfs_init(), with drm_debugfs_root passed
> > >> > as the root for the new debugfs dir at debugfs_create_dir().
> > >> >
> > >> > This led to a kernel panic once we were either derefencing root->d_inode
> > >> > while it was NULL or calling root->d_inode->i_op->lookup() while it was
> > >> > NULL in debugfs at inode_lock() or lookup_*().
> > >> >
> > >> > Signed-off-by: Alexandru Moise <00moses.alexander00@xxxxxxxxx>
> > >> > ---
> > >> > drivers/gpu/drm/drm_drv.c | 16 ++++++++++++++++
> > >> > 1 file changed, 16 insertions(+)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > >> > index 37b8ad3e30d8..2ed2d919beae 100644
> > >> > --- a/drivers/gpu/drm/drm_drv.c
> > >> > +++ b/drivers/gpu/drm/drm_drv.c
> > >> > @@ -63,6 +63,15 @@ module_param_named(debug, drm_debug, int, 0600);
> > >> > static DEFINE_SPINLOCK(drm_minor_lock);
> > >> > static struct idr drm_minors_idr;
> > >> >
> > >> > +/*
> > >> > + * If the drm core fails to init for whatever reason,
> > >> > + * we should prevent any drivers from registering with it.
> > >> > + * It's best to check this at drm_dev_init(), as some drivers
> > >> > + * prefer to embed struct drm_device into their own device
> > >> > + * structure and call drm_dev_init() themselves.
> > >> > + */
> > >> > +static bool drm_core_init_complete = false;
> > >> > +
> > >> > static struct dentry *drm_debugfs_root;
> > >> >
> > >> > #define DRM_PRINTK_FMT "[" DRM_NAME ":%s]%s %pV"
> > >> > @@ -484,6 +493,11 @@ int drm_dev_init(struct drm_device *dev,
> > >> > {
> > >> > int ret;
> > >> >
> > >> > + if (!drm_core_init_complete) {
> > >> > + DRM_ERROR("DRM core is not initialized\n");
> > >> > + return -ENODEV;
> > >> > + }
> > >> > +
> > >> > kref_init(&dev->ref);
> > >> > dev->dev = parent;
> > >> > dev->driver = driver;
> > >> > @@ -966,6 +980,8 @@ static int __init drm_core_init(void)
> > >> > if (ret < 0)
> > >> > goto error;
> > >> >
> > >> > + drm_core_init_complete = true;
> > >> > +
> > >> > DRM_DEBUG("Initialized\n");
> > >> > return 0;
> > >>
> > >> Isn't the correct fix to pass down the error value, which iirc should
> > >> make the kmod stuff unload the module again? Or does this not work'
> > >> -Daniel
> > >
> > > What if everything is built in?
> >
> > I feared that would be the answer :-/ Still feels funny that everyone
> > will need to hand-roll this, or does everyone simply assume that their
> > subsystem's module_init never fails?
> >
> > Adding a pile of kmod and driver folks in the hopes of getting a
> > better answer. If there's no better answer pls remind me to merge your
> > patch in 1-2 weeks, I'll likely forget ...
> > -Daniel
>
> I took a look at all the DRM drivers and they all seem pretty sane,
> they all handle the case of the DRM core failure, the fault is clearly
> on the DRM core side, it makes absolutely no sense trying to register
> with a subsystem that failed to init.
>
> This is certainly not a subsystem I am intimately acquainted with
> however, I just stumbled on this crash while working with debugfs.
>
> Most of the DRM core code looks pretty solid, and this is a relatively
> small corner-case, I believe most people out there won't be eating up
> that many major numbers. But I believe it's still the Right Thingâ to do.
>
> It might save some headaches in the future as DRM becomes more complex.
I'm not worried about drm or drm drivers here, I'm just wondered how all
the other subystems handle this same problem. Pretty sure we're not the
only ones who register a chardev block in the core subsystem module (and
sysfs classes, and tons of other stuff), and then drivers would fall over
if that's not there. Having to open code a $foo_init_successful everywhere
seems somewhat silly.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch