Re: refactor the i915 GVT support
From: Wang, Zhi A
Date: Tue Sep 28 2021 - 03:41:08 EST
Hey guys:
After some investigation, I found the root cause this problem ("i915"
module loading will be stuck with Christoph's refactor patches), which
can be reproduced by building both i915 and kvmgt as kernel module and
the loading i915.
The root cause is: in Linux kernel loading, before a kernel module
loading is finished, its symbols can not be reached by other module when
resolving the symbols (even they can be found in /proc/kallsyms).
Because the status of the kernel module is MODULE_STATE_COMING and
resolve_symbol() from another kernel module will check this and return a
-EBUSY.
In this case, before i915 loading is finished, the requested module
"kvmgt" cannot reach the symbols in module i915. Thus it kept waiting
and left message like below in the dmesg:
[ 644.152021] kvmgt: gave up waiting for init of module i915.
[ 644.152039] kvmgt: Unknown symbol i915_gem_object_set_to_cpu_domain
(err -16)
[ 674.871409] kvmgt: gave up waiting for init of module i915.
[ 674.871427] kvmgt: Unknown symbol intel_ring_begin (err -16)
[ 705.590586] kvmgt: gave up waiting for init of module i915.
[ 705.590604] kvmgt: Unknown symbol i915_vma_move_to_active (err -16)
[ 736.310230] kvmgt: gave up waiting for init of module i915.
[ 736.310248] kvmgt: Unknown symbol shmem_unpin_map (err -16)
...
The error message is from execution path below:
kernel/module.c:
[i915 module loading] ->
request_module("kvmgt")->[modprobe]->init_module("kvmgt")->load_module()->simplify_symbols()->resolve_symbol_wait():
static const struct kernel_symbol *
resolve_symbol_wait(struct module *mod,
const struct load_info *info,
const char *name)
{
const struct kernel_symbol *ksym;
char owner[MODULE_NAME_LEN];
if (wait_event_interruptible_timeout(module_wq,
!IS_ERR(ksym = resolve_symbol(mod, info, name, owner))
|| PTR_ERR(ksym) != -EBUSY,
30 * HZ) <= 0) {
pr_warn("%s: gave up waiting for init of module %s.\n",
mod->name, owner);
}
code:
https://github.com/intel/gvt-linux/blob/bd950a66c7919d7121d2530f30984351534a96dc/kernel/module.c#L1452
In resolve_symbol_wait(), it calls resolve_symbol() to resolve the
symbols in "i915". In resolve_symbol() -> ref_module() ->
strong_try_module_get(), it will check the status of the module which
owns the symbol.
static inline int strong_try_module_get(struct module *mod)
{
BUG_ON(mod && mod->state == MODULE_STATE_UNFORMED);
if (mod && mod->state == MODULE_STATE_COMING)
return -EBUSY;
if (try_module_get(mod))
return 0;
else
return -ENOENT;
}
code:https://github.com/intel/gvt-linux/blob/bd950a66c7919d7121d2530f30984351534a96dc/kernel/module.c#L318
But unfortunately, this execution path begins in i915 module loading, at
this time, the status of kernel module "i915" is MODULE_STATE_COMING
until loading of "kvmgt" is finished. Thus a -EBUSY is always returned
when kernel is trying to resolve symbols for "kvmgt".
This patch below might need re-work:
Author: Christoph Hellwig <hch@xxxxxx>
Date: Wed Jul 21 17:53:38 2021 +0200
drm/i915/gvt: move the gvt code into kvmgt.ko
Instead of having an option to build the gvt code into the main i915
module, just move it into the kvmgt.ko module. This only requires
a new struct with three entries that the main i915 module needs to
request before enabling VGPU passthrough operations.
This also conveniently streamlines the GVT initialization and avoids
the need for the global device pointer.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>
Link:
http://patchwork.freedesktop.org/patch/msgid/20210721155355.173183-5-hch@xxxxxx
Acked-by: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>
On 8/26/21 6:12 AM, Zhenyu Wang wrote:
> On 2021.08.20 12:56:34 -0700, Luis Chamberlain wrote:
>> On Fri, Aug 20, 2021 at 04:17:24PM +0200, Christoph Hellwig wrote:
>>> On Thu, Aug 19, 2021 at 04:29:29PM +0800, Zhenyu Wang wrote:
>>>> I'm working on below patch to resolve this. But I met a weird issue in
>>>> case when building i915 as module and also kvmgt module, it caused
>>>> busy wait on request_module("kvmgt") when boot, it doesn't happen if
>>>> building i915 into kernel. I'm not sure what could be the reason?
>>> Luis, do you know if there is a problem with a request_module from
>>> a driver ->probe routine that is probably called by a module_init
>>> function itself?
>> Generally no, but you can easily foot yourself in the feet by creating
>> cross dependencies and not dealing with them properly. I'd make sure
>> to keep module initialization as simple as possible, and run whatever
>> takes more time asynchronously, then use a state machine to allow
>> you to verify where you are in the initialization phase or query it
>> or wait for a completion with a timeout.
>>
>> It seems the code in question is getting some spring cleaning, and its
>> unclear where the code is I can inspect. If there's a tree somewhere I
>> can take a peak I'd be happy to review possible oddities that may stick
>> out.
> I tried to put current patches under test here: https://github.com/intel/gvt-linux/tree/gvt-staging
> The issue can be produced with CONFIG_DRM_I915=m and CONFIG_DRM_I915_GVT_KVMGT=m.
>
>> My goto model for these sorts of problems is to abstract the issue
>> *outside* of the driver in question and implement new selftests to
>> try to reproduce. This serves two purposes, 1) helps with testing
>> 2) may allow you to see the problem more clearly.
>>
> I'll see if can abstract that.
>
> Thanks, Luis.