Re: [PATCH v7 03/15] s390/zcrypt: driver callback to indicate resource in use

From: Harald Freudenberger
Date: Fri Apr 17 2020 - 09:54:27 EST


On 16.04.20 11:33, Cornelia Huck wrote:
> On Wed, 15 Apr 2020 08:08:24 +0200
> Harald Freudenberger <freude@xxxxxxxxxxxxx> wrote:
>
>> On 14.04.20 14:58, Cornelia Huck wrote:
>>> On Tue, 7 Apr 2020 15:20:03 -0400
>>> Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
>>>> + /* The non-default driver's module must be loaded */
>>>> + if (!try_module_get(drv->owner))
>>>> + return 0;
>>> Is that really needed? I would have thought that the driver core's
>>> klist usage would make sure that the callback would not be invoked for
>>> drivers that are not registered anymore. Or am I missing a window?
>> The try_module_get() and module_put() is a result of review feedback from
>> my side. The ap bus core is static in the kernel whereas the
>> vfio dd is a kernel module. So there may be a race condition between
>> calling the callback function and removal of the vfio dd module.
>> There is similar code in zcrypt_api which does the same for the zcrypt
>> device drivers before using some variables or functions from the modules.
>> Help me, it this is outdated code and there is no need to adjust the
>> module reference counter any more, then I would be happy to remove
>> this code :-)
> I think the driver core already should keep us safe. A built-in bus
> calling a driver in a module is a very common pattern, and I think
> ->owner was introduced exactly for that case.
>
> Unless I'm really missing something obvious?
Hm. I tested a similar code (see zcrypt_api.c where try_module_get() and module_put()
is called surrounding use of functions related to the implementing driver.
The driver module has a reference count of 0 when not used and can get removed
- because refcount is 0 - at any time when there is nothing related to the driver pending.

As soon as the driver is actually used the try_module_get(...driver.owner) increases
the reference counter and makes it impossible to remove the module. After use the
module_put() reduces the reference count.
When I now remove the try_module_get() and module_put() calls and run this modified
code I immediately face a crash when the module is removed during use.

I see code in the kernel which does an initial try_module_get() on the driver to increase
the reference count, for example when the driver registers. However, I see no clear
way to remove such a driver module any more.

I know I had a fight with a tester some years ago where he stated that it is a valid
testcase to remove a device driver module 'during use of the driver'. So I'd like
to have the try_module_get() and module_put() invokations in the ap bus code
until you convince me there are other maybe better ways to make sure the
driver and it's functions are available at the time of the call.

Maybe we can discuss this offline if you wish :-)