Re: Free after use in fw_pm_notify()->kill_requests_without_uevent() due pending_fw_head

From: Luis R. Rodriguez
Date: Tue Mar 14 2017 - 20:53:27 EST


On Tue, Feb 21, 2017 at 06:59:12PM -0800, Sodagudi Prasad wrote:
> On 2017-01-03 07:19, Greg KH wrote:
> > On Tue, Jan 03, 2017 at 06:44:03AM -0800, Sodagudi Prasad wrote:
> > >
> > > Hi All,
> > >
> > > Device has crashed due to memory access after free while
> > > pending_fw_head
> > > list accessed. Kernel 4.4 stable version is used to reproduce this
> > > use after
> > > free.
> > > ------------------------------------------------------------------------------------------
> > > [ 9031.178428] Unable to handle kernel paging request at virtual
> > > address
> > > 6b6b6b6b6b6b6b6b
> > > [ 9031.178508] pgd = ffffffc0de9d2000
> > > [ 9031.185888] [6b6b6b6b6b6b6b6b] *pgd=0000000000000000,
> > > *pud=0000000000000000
> > > [ 9031.253045] ------------[ cut here ]------------
> > > [ 9031.253100] Kernel BUG at ffffff800864c0a0 [verbose debug info
> > > unavailable]
> > > [ 9031.256860] Internal error: Oops - BUG: 96000004 [#1] PREEMPT SMP
> > > [ 9031.263539] Modules linked in:
> > > [ 9031.272708] CPU: 6 PID: 1373 Comm: system_server Tainted: G
> > > W L
> > > 4.4.16+ #1
> > > [ 9031.280648] task: ffffffc0d1a1d700 ti: ffffffc0d1a2c000 task.ti:
> > > ffffffc0d1a2c000
> > > [ 9031.287776] PC is at fw_pm_notify+0x84/0x19c
> > > [ 9031.295215] LR is at fw_pm_notify+0x60/0x19c
> > > [ 9031.511559] [] fw_pm_notify+0x84/0x19c
> > > [ 9031.519355] [] notifier_call_chain+0x58/0x8c
> > > [ 9031.524739] [] __blocking_notifier_call_chain+0x54/0x70
> > > [ 9031.530387] [] blocking_notifier_call_chain+0x38/0x44
> > > [ 9031.537243] [] pm_notifier_call_chain+0x28/0x48
> > > [ 9031.543662] [] pm_suspend+0x278/0x674
> > > [ 9031.549906] [] state_store+0x58/0x90
> > > [ 9031.554942] [] kobj_attr_store+0x18/0x28
> > > [ 9031.560154] [] sysfs_kf_write+0x5c/0x68
> > > [ 9031.565620] [] kernfs_fop_write+0x114/0x16c
> > > [ 9031.571092] [] __vfs_write+0x48/0xf0
> > > [ 9031.576816] [] vfs_write+0xb8/0x150
> > > [ 9031.581848] [] SyS_write+0x58/0x94
> > > [ 9031.586973] [] el0_svc_naked+0x24/0x28
> > > -----------------------------------------------------------------------------------------------
> > >
> > > Kernel panic is observed during device suspend/resume path in the
> > > kill_requests_without_uevent() called from fw_pm_notify().
> > > when pending_list of a firmware_buf is accessed 0x6b(free pattern)
> > > pattern
> > > observed. Based on this firmware_buf is freed even if firmware_buf
> > > is part
> > > of
> > > pending_fw_head list.
> >
> > What are you doing in userspace to trigger this problem? What kernel
> > driver is this happening with?
> Device continuous suspend and resume is happening here. I think, echo mem >
> /sys/power/state issued here.
> It is not clear what driver involved here, because after firmware_buf is
> freed all memory gets filled with 0x6b pattern.

It is important to find what API call was used too.

The only thing I can think of is what follows:

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index ac350c518e0c..ced46339226c 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -1005,7 +1005,6 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,

mutex_lock(&fw_lock);
list_add(&buf->pending_list, &pending_fw_head);
- mutex_unlock(&fw_lock);

if (opt_flags & FW_OPT_UEVENT) {
buf->need_uevent = true;
@@ -1015,6 +1014,7 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
} else {
timeout = MAX_JIFFY_OFFSET;
}
+ mutex_unlock(&fw_lock);

retval = fw_state_wait_timeout(&buf->fw_st, timeout);
if (retval < 0) {

The reason is that kill_requests_without_uevent() checks for the
buf->need_uevent but we modify the buf->need_uevent = true; above after we
unlock. However, I've tried creating a test case against this and ran a loop
against tools/testing/selftests/firmware/fw_fallback.sh first will all test and
then with only load_fw_custom test and a timeout of 1 second while I loop
suspend/resume and I cannot reproduce. I tried to aggregate the situation by
sprinkling ssleep(10) after the lock (without the above patch) and suspending
after but no dice. This should give you an idea of what I mean by crafting
a hack to try to force the situation if the above really is the fix.

So -- try the above patch, if that does not fix it, then please try to see if
you can create a test case or get crafty and try to force the situation around
areas you suspect might cause it, use qemu to try to force this issue and use
the test driver.

Then on a loop run the test and also do: systemctl suspend0

If using qemu you can force resume using:

echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl

where /dev/pts/7 here is my control monitor interface.

Also add to your /etc/default/grub (I know, you are probalby on Android
right... so whatever you can use to modify the kernel command line) if dynamic
debug is enabled and use:

dyndbg=\"file drivers/base/dd.c +p; file drivers/base/firmware_class.c +p;\"

This should hopefully hep narrow down that driver is doing what here.

> > And 4.4.16 is pretty old, can you try 4.9?
> We don't have system which runs on new kernels. Looking for possible
> reasons/path, how firmware_buf can get freed when that in pending_fw_head
> list.

The above is the only thing I can think of. The best is for you to try to
reproduce using the test driver and building a careful test case that tries to
explode things. The test driver is lib/test_firmware.c and the script is
tools/testing/selftests/firmware/fw_fallback.sh.

Please note I did test using CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y.

For those on the CC list: please note I was busy and I will review your patches
or questions hopefully this week. Also note that due to historic issues which
are not well tested like the fallback interface -- I am now asking folks to
draw up test cases for their new code. You can however wait until I review your
code.

Luis