Re: [PATCH v8 11/12] zram: fix crashes with cpu hotplug multistate

From: Ming Lei
Date: Tue Nov 02 2021 - 11:18:24 EST


On Tue, Nov 02, 2021 at 03:51:33PM +0100, Petr Mladek wrote:
> On Tue 2021-11-02 15:15:19, Petr Mladek wrote:
> > On Tue 2021-10-26 23:37:30, Ming Lei wrote:
> > > On Tue, Oct 26, 2021 at 10:48:18AM +0200, Petr Mladek wrote:
> > > > Below are more details about the livepatch code. I hope that it will
> > > > help you to see if zram has similar problems or not.
> > > >
> > > > We have kobject in three structures: klp_func, klp_object, and
> > > > klp_patch, see include/linux/livepatch.h.
> > > >
> > > > These structures have to be statically defined in the module sources
> > > > because they define what is livepatched, see
> > > > samples/livepatch/livepatch-sample.c
> > > >
> > > > The kobject is used there to show information about the patch, patched
> > > > objects, and patched functions, in sysfs. And most importantly,
> > > > the sysfs interface can be used to disable the livepatch.
> > > >
> > > > The problem with static structures is that the module must stay
> > > > in the memory as long as the sysfs interface exists. It can be
> > > > solved in module_exit() callback. It could wait until the sysfs
> > > > interface is destroyed.
> > > >
> > > > kobject API does not support this scenario. The relase() callbacks
> > >
> > > kobject_delete() is for supporting this scenario, that is why we don't
> > > need to grab module refcnt before calling show()/store() of the
> > > kobject's attributes.
> > >
> > > kobject_delete() can be called in module_exit(), then any show()/store()
> > > will be done after kobject_delete() returns.
> >
> > I am a bit confused. I do not see kobject_delete() anywhere in kernel
> > sources.
> >
> > I see only kobject_del() and kobject_put(). AFAIK, they do _not_
> > guarantee that either the sysfs interface was destroyed or
> > the release callbacks were called. For example, see
> > schedule_delayed_work(&kobj->release, delay) in kobject_release().
>
> Grr, I always get confused by the code. kobject_del() actually waits
> until the sysfs interface gets destroyed. This is why there is
> the deadlock.

Right.

>
> But kobject_put() is _not_ synchronous. And the comment above
> kobject_add() repeat 3 times that kobject_put() must be called
> on success:
>
> * Return: If this function returns an error, kobject_put() must be
> * called to properly clean up the memory associated with the
> * object. Under no instance should the kobject that is passed
> * to this function be directly freed with a call to kfree(),
> * that can leak memory.
> *
> * If this function returns success, kobject_put() must also be called
> * in order to properly clean up the memory associated with the object.
> *
> * In short, once this function is called, kobject_put() MUST be called
> * when the use of the object is finished in order to properly free
> * everything.
>
> and similar text in Documentation/core-api/kobject.rst
>
> After a kobject has been registered with the kobject core successfully, it
> must be cleaned up when the code is finished with it. To do that, call
> kobject_put().
>
>
> If I read the code correctly then kobject_put() calls kref_put()
> that might call kobject_delayed_cleanup(). This function does a lot
> of things and need to access struct kobject.

Yes, then what is the problem here wrt. kobject_put() which may not be
synchronous?

>
> > IMHO, kobject API does not support static structures and module
> > removal.
>
> If kobject_put() has to be called also for static structures then
> module_exit() must explicitly wait until the clean up is finished.

Right, that is exactly how klp_patch kobject is implemented. klp_patch
kobject has to be disabled first, then module refcnt can be dropped after
the klp_patch kobject is released. Then module_exit() is possible.

Thanks,
Ming