Re: edac KASAN warning in experimental arm64 allmodconfig boot

From: Borislav Petkov
Date: Mon Oct 14 2019 - 12:09:12 EST


On Mon, Oct 14, 2019 at 04:18:49PM +0100, John Garry wrote:
> Hi guys,
>
> I'm experimenting by trying to boot an allmodconfig arm64 kernel, as
> mentioned here:
> https://lore.kernel.org/linux-arm-kernel/507325a3-030e-2843-0f46-7e18c60257de@xxxxxxxxxx/
>
> One thing that I noticed - it's hard to miss actually - is the amount of
> complaining from KASAN about the EDAC/ghes code. Maybe this is something I
> should not care about/red herring, or maybe something genuine. Let me know
> what you think.
>
> The kernel is v5.4-rc3, and I raised the EDAC mc debug level to get extra
> debug prints.
>
> Log below, Thanks,
> John
> Log snippet (I cut off after the first KASAN warning):
>
> [ 70.471011][ T1] random: get_random_u32 called from new_slab+0x360/0x698 with crng_init=0
> [ 70.478671][ T1] [Firmware Bug]: APEI: Invalid bit width + offset in GAR [0x94110034/64/0/3/0]
> [ 70.526585][ T1] EDAC DEBUG: edac_mc_alloc: allocating 3524 bytes for mci data (32 dimms, 32 csrows/channels)
> [ 70.542013][ T1] EDAC DEBUG: ghes_edac_dmidecode: DIMM2: Registered-DDR4 size = 16384 MB(ECC)
> [ 70.551044][ T1] EDAC DEBUG: ghes_edac_dmidecode: type 26, detail 0x2080, width 72(total 64)
> [ 70.559986][ T1] EDAC DEBUG: edac_mc_add_mc_with_groups:
> [ 70.567082][ T1] EDAC DEBUG: edac_create_sysfs_mci_device: device mc0 created
> [ 70.575608][ T1] EDAC DEBUG: edac_create_dimm_object: device dimm2 created at location memory 2
> [ 70.585818][ T1] EDAC DEBUG: edac_create_csrow_object: device csrow2 created
> [ 70.594110][ T1] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
> [ 70.605936][ T1] EDAC DEBUG: edac_mc_del_mc:
> [ 70.611188][ T1] EDAC DEBUG: edac_remove_sysfs_mci_device:
> [ 70.619443][ T1] random: get_random_u32 called from kobject_put+0x8c/0x190 with crng_init=0
> [ 70.628163][ T1] kobject: 'csrow2' ((____ptrval____)): kobject_release, parent (____ptrval____) (delayed 750)
> [ 70.638477][ T1] EDAC DEBUG: edac_remove_sysfs_mci_device: unregistering device dimm2
> [ 70.647903][ T1] kobject: 'dimm2' ((____ptrval____)): kobject_release, parent (____ptrval____) (delayed 250)
> [ 70.658105][ T1] EDAC MC: Removed device 0 for ghes_edac.c ghes_edac: DEV ghes
> [ 70.665673][ T1] EDAC DEBUG: edac_mc_free:
> [ 70.670211][ T1] EDAC DEBUG: edac_unregister_sysfs: unregistering device mc0
> [ 70.679027][ T1] kobject: 'mc0' ((____ptrval____)): kobject_release, parent (____ptrval____) (delayed 500)
> [ 70.690987][ T1] EDAC DEBUG: edac_mc_del_mc:
> [ 70.695769][ T1] EDAC DEBUG: edac_mc_free:
> [ 70.700412][ T1] ------------[ cut here ]------------
> [ 70.705832][ T1] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x48
> [ 70.716663][ T1] WARNING: CPU: 50 PID: 1 at lib/debugobjects.c:484 debug_print_object+0xec/0x130

If I am parsing these unwrapped messages correctly (btw, pls use another
mail client for pasting log lines - thunderbird is usually ok but I
guess you need to configure it properly), that must be some workqueue
object of sorts.

Now, ghes_edac doesn't init the workqueue:

[ 70.594110][ T1] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)

as it is in interrupt mode.

So the only other workqueue I see is that "delayed XXX" stuff which is in
kobject_release().

AFAICT.

Do you have CONFIG_DEBUG_KOBJECT_RELEASE enabled and if so, does the
warning go away if you disable it?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette