Re: [PATCH] module: Add more error message for failed kernel module loading
From: Prarit Bhargava
Date: Tue Sep 01 2020 - 20:06:34 EST
On 9/1/20 4:17 PM, Lucas De Marchi wrote:
> On Tue, Sep 1, 2020 at 12:56 PM Prarit Bhargava <prarit@xxxxxxxxxx> wrote:
>>
>>
>>
>> On 9/1/20 2:50 PM, Lucas De Marchi wrote:
>>> On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>>>>
>>>> When kernel module loading failed, user space only get one of the
>>>> following error messages:
>>>> - -ENOEXEC
>>>> This is the most confusing one. From corrupted ELF header to bad
>>>> WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
>>>> all returns this error number.
>>>>
>>>> - -EPERM
>>>> This is for blacklisted modules. But mod doesn't do extra explain
>>>> on this error either.
>>>>
>>>> - -ENOMEM
>>>> The only error which needs no explain.
>>>>
>>>> This means, if a user got "Exec format error" from modprobe, it provides
>>>> no meaningful way for the user to debug, and will take extra time
>>>> communicating to get extra info.
>>>>
>>>> So this patch will add extra error messages for -ENOEXEC and -EPERM
>>>> errors, allowing user to do better debugging and reporting.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>>> ---
>>>> kernel/module.c | 11 +++++++++--
>>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/module.c b/kernel/module.c
>>>> index 1c5cff34d9f2..9f748c6eeb48 100644
>>>> --- a/kernel/module.c
>>>> +++ b/kernel/module.c
>>>> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
>>>> int i;
>>>>
>>>> for (i = 0; i < hdr->e_shnum; i++) {
>>>> - if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
>>>> + if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
>>>> + pr_err(
>>>> + "Module %s section %d has invalid WRITE|EXEC flags\n",
>>>> + mod->name, i);
>>>> return -ENOEXEC;
>>>> + }
>>>> }
>>>>
>>>> return 0;
>>>> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>>> char *after_dashes;
>>>>
>>>> err = elf_header_check(info);
>>>> - if (err)
>>>> + if (err) {
>>>> + pr_err("Module has invalid ELF header\n");
>>>> goto free_copy;
>>>> + }
>>>>
>>>> err = setup_load_info(info, flags);
>>>> if (err)
>>>> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>>>
>>>> if (blacklisted(info->name)) {
>>>> err = -EPERM;
>>>> + pr_err("Module %s is blacklisted\n", info->name);
>>>
>>> I wonder why would anyone actually add the blacklist to the command
>>> line like this and have no
>>> way to revert that back. This was introduced in
>>
>> Debug. Debug. Debug. ;) The parameter was added to debug broken installations
>> and kernel boots.
>>
>>> be7de5f91fdc modules: Add kernel parameter to blacklist modules
>>> as a way to overcome broken initrd generation afaics.
>>
>> Not the generation of the initramfs, but blocking a module loaded during the
>> boot. The installation may have failed and there's no easy way to then debug
>> what module was responsible for the failure.
>
> if you are using initrd, then *inside* the initrd you should have the
> /etc/modprobe.d/* file
> you want. I said "broken initrd generation" because the tool should
> put the file there, and
> apparently for you it isn't.
>
> Even if you don't have the file, you could use modprobe.blacklist= and
> let the blacklist happen
> in the userspace library rather than in the kernel. Module loading is
> not like firmware loading
> that happens without help from userspace.
>
>>
>> Either kernel
>>> command line (using modprobe.blacklist)
>>> or /etc/modprobe.d options are honoured by libkmod and allow a
>>> sufficiently privileged user to bypass it.
>>>
>>
>> Both of those options only work if the filesystem is mounted IIRC. It could be
>> the case that modprobe.blacklist now works earlier in the boot, however, I've
>> never used it because module_blacklist is the biggest and best hammer that I can
>> use to get through a broken installation or boot.
>>
>> In any case you're incorrectly assuming that the system has a filesystem on it.
>> That's not necessarily the case as I'll explain below.
>>
>>> +Rusty, +Prarit: is there anything this module parameter is covering
>>> that I'm missing?
>>
>> This is the situation I have repeatedly come across : A system at a remote site
>> will not boot any flavor of Linux. Since the system would not install the only
>> way to debug was to provide install images to workaround a module load failure.
>> As you can imagine that is a time consuming process as well as a bad end user
>> experience.
>>
>> I got so sick of it that I wrote the code above (well similar to it anyway --
>> thanks for the review Randy ;)) to add the module_blacklist parameter to make
>> debugging an uninstallable "bricked" system easier.
>>
>> It's come in handy in some unexpected ways. We've had situations where modules
>> have loaded and corrupted memory and blacklisting them revealed that the modules
>> were responsible for the memory corruption.
>
> ok... but this seems a reimplementation of modprobe.blacklist= option
> in the kernel command line,
> but in kernel space, with no way to remove it after the kernel is booted.
>
That's *EXACTLY* what I want. I do not want modprobe (because of some
misconfiguration) to load the module I've explicitly blacklisted.
P.
> Lucas De Marchi
>
>>
>> P.
>>
>>>
>>> For the changes here,
>>>
>>> Reviewed-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>>>
>>> thanks
>>> Lucas De Marchi
>>>
>>>> goto free_copy;
>>>> }
>>>>
>>>> --
>>>> 2.27.0
>>>>
>>>
>>
>