A test kernel with the options set below revealed several issues when
removing a mci device:
DEBUG_TEST_DRIVER_REMOVE
KASAN
DEBUG_KMEMLEAK
Issues seen:
1) Use-after-free:
On 27.11.19 17:07:33, John Garry wrote:
[ 22.104498] BUG: KASAN: use-after-free in
edac_remove_sysfs_mci_device+0x148/0x180
The use-after-free is triggered in edac_remove_sysfs_mci_device(). It
became an issue with commit c498afaf7df8 ("EDAC: Introduce an
mci_for_each_dimm() iterator").
The reason for it is that device_unregister(&dimm->dev) not only
removes the sysfs entry, it also frees the dimm struct in
dimm_attr_release(). When incrementing the loop in
mci_for_each_dimm(), the dimm struct is accessed again by the loop
iterator which causes the use-after-free.
In function edac_remove_sysfs_mci_device() all the mci device's
subsequent dimm and csrow objects are removed. When unregistering from
sysfs, instead of removing that data it should be kept until it is
removed together with the mci device. This keeps the data structures
intact and the mci device can be fully used until it will be removed.
2) Memory leaks:
Following memory leaks have been detected:
# grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows
16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels
16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn]
1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms
34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
There are two implementions for device removal in the driver. One is
used before edac_mc_add_mc(), the other afterwards after the device
had been registered in sysfs. The later lacks the removal of some data
allocated in edac_mc_alloc(). All the above issues are fixed as
follows:
Unify release code in a single mci_release() function and use this one
together with put_device() to release the struct mci once there are no
users. Free all subsequent data structures of the children devices in
that release function too. An effect of this is that no data is freed
in edac_mc_sysfs.c (except the "mc" sysfs root node). All sysfs
entries have the mci device as a parent, so its refcount will keep the
struct from being removed as long as sysfs entries exist. Before
freeing struct mci, all sysfs entries are removed now in edac_remove_
sysfs_mci_device(). With the changes made the mci_for_each_dimm() loop
is now save to remove dimm devices from sysfs.
The patch has been tested with the above kernel options, no issues
seen any longer.
This patch should be marked as stable.
Reported-by: John Garry <john.garry@xxxxxxxxxx>