Re: Free after use in fw_pm_notify()->kill_requests_without_uevent() due pending_fw_head
From: Luis R. Rodriguez
Date: Fri Mar 31 2017 - 20:54:08 EST
On Tue, Mar 14, 2017 at 5:53 PM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> On Tue, Feb 21, 2017 at 06:59:12PM -0800, Sodagudi Prasad wrote:
>> On 2017-01-03 07:19, Greg KH wrote:
>> > On Tue, Jan 03, 2017 at 06:44:03AM -0800, Sodagudi Prasad wrote:
>> > >
>> > > Hi All,
>> > >
>> > > Device has crashed due to memory access after free while
>> > > pending_fw_head
>> > > list accessed. Kernel 4.4 stable version is used to reproduce this
>> > > use after
>> > > free.
>> > > ------------------------------------------------------------------------------------------
>> > > [ 9031.178428] Unable to handle kernel paging request at virtual
>> > > address
>> > > 6b6b6b6b6b6b6b6b
>> > > [ 9031.178508] pgd = ffffffc0de9d2000
>> > > [ 9031.185888] [6b6b6b6b6b6b6b6b] *pgd=0000000000000000,
>> > > *pud=0000000000000000
>> > > [ 9031.253045] ------------[ cut here ]------------
>> > > [ 9031.253100] Kernel BUG at ffffff800864c0a0 [verbose debug info
>> > > unavailable]
>> > > [ 9031.256860] Internal error: Oops - BUG: 96000004 [#1] PREEMPT SMP
>> > > [ 9031.263539] Modules linked in:
>> > > [ 9031.272708] CPU: 6 PID: 1373 Comm: system_server Tainted: G
>> > > W L
>> > > 4.4.16+ #1
>> > > [ 9031.280648] task: ffffffc0d1a1d700 ti: ffffffc0d1a2c000 task.ti:
>> > > ffffffc0d1a2c000
>> > > [ 9031.287776] PC is at fw_pm_notify+0x84/0x19c
>> > > [ 9031.295215] LR is at fw_pm_notify+0x60/0x19c
>> > > [ 9031.511559] [] fw_pm_notify+0x84/0x19c
>> > > [ 9031.519355] [] notifier_call_chain+0x58/0x8c
>> > > [ 9031.524739] [] __blocking_notifier_call_chain+0x54/0x70
>> > > [ 9031.530387] [] blocking_notifier_call_chain+0x38/0x44
>> > > [ 9031.537243] [] pm_notifier_call_chain+0x28/0x48
>> > > [ 9031.543662] [] pm_suspend+0x278/0x674
>> > > [ 9031.549906] [] state_store+0x58/0x90
>> > > [ 9031.554942] [] kobj_attr_store+0x18/0x28
>> > > [ 9031.560154] [] sysfs_kf_write+0x5c/0x68
>> > > [ 9031.565620] [] kernfs_fop_write+0x114/0x16c
>> > > [ 9031.571092] [] __vfs_write+0x48/0xf0
>> > > [ 9031.576816] [] vfs_write+0xb8/0x150
>> > > [ 9031.581848] [] SyS_write+0x58/0x94
>> > > [ 9031.586973] [] el0_svc_naked+0x24/0x28
>> > > -----------------------------------------------------------------------------------------------
>> > >
>> > > Kernel panic is observed during device suspend/resume path in the
>> > > kill_requests_without_uevent() called from fw_pm_notify().
>> > > when pending_list of a firmware_buf is accessed 0x6b(free pattern)
>> > > pattern
>> > > observed. Based on this firmware_buf is freed even if firmware_buf
>> > > is part
>> > > of
>> > > pending_fw_head list.
>> >
>> > What are you doing in userspace to trigger this problem? What kernel
>> > driver is this happening with?
>> Device continuous suspend and resume is happening here. I think, echo mem >
>> /sys/power/state issued here.
>> It is not clear what driver involved here, because after firmware_buf is
>> freed all memory gets filled with 0x6b pattern.
>
> It is important to find what API call was used too.
Did you find out ?
> The only thing I can think of is what follows:
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index ac350c518e0c..ced46339226c 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -1005,7 +1005,6 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
>
> mutex_lock(&fw_lock);
> list_add(&buf->pending_list, &pending_fw_head);
> - mutex_unlock(&fw_lock);
>
> if (opt_flags & FW_OPT_UEVENT) {
> buf->need_uevent = true;
> @@ -1015,6 +1014,7 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
> } else {
> timeout = MAX_JIFFY_OFFSET;
> }
> + mutex_unlock(&fw_lock);
>
> retval = fw_state_wait_timeout(&buf->fw_st, timeout);
> if (retval < 0) {
>
> The reason is that kill_requests_without_uevent() checks for the
> buf->need_uevent but we modify the buf->need_uevent = true; above after we
> unlock. However, I've tried creating a test case against this and ran a loop
> against tools/testing/selftests/firmware/fw_fallback.sh first will all test and
> then with only load_fw_custom test and a timeout of 1 second while I loop
> suspend/resume and I cannot reproduce. I tried to aggregate the situation by
> sprinkling ssleep(10) after the lock (without the above patch) and suspending
> after but no dice. This should give you an idea of what I mean by crafting
> a hack to try to force the situation if the above really is the fix.
>
> So -- try the above patch,
Did you test it?
> if that does not fix it, then please try to see if
> you can create a test case or get crafty and try to force the situation around
> areas you suspect might cause it, use qemu to try to force this issue and use
> the test driver.
Did you ?
> Then on a loop run the test and also do: systemctl suspend
>
> If using qemu you can force resume using:
>
> echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl
>
> where /dev/pts/7 here is my control monitor interface.
>
> Also add to your /etc/default/grub (I know, you are probalby on Android
> right... so whatever you can use to modify the kernel command line) if dynamic
> debug is enabled and use:
>
> dyndbg=\"file drivers/base/dd.c +p; file drivers/base/firmware_class.c +p;\"
>
> This should hopefully hep narrow down that driver is doing what here.
Did you ?
>> > And 4.4.16 is pretty old, can you try 4.9?
>> We don't have system which runs on new kernels. Looking for possible
>> reasons/path, how firmware_buf can get freed when that in pending_fw_head
>> list.
>
> The above is the only thing I can think of. The best is for you to try to
> reproduce using the test driver and building a careful test case that tries to
> explode things. The test driver is lib/test_firmware.c and the script is
> tools/testing/selftests/firmware/fw_fallback.sh.
Please note that working upstream is important, the lost touch of
doing so will keep you in the stone age. You need active developers
both testing, writing test cases. Without any of this you are working
with rather old code and without proper feedback it will be hard to
fix any issue.
Luis
> Please note I did test using CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y.
>
> For those on the CC list: please note I was busy and I will review your patches
> or questions hopefully this week. Also note that due to historic issues which
> are not well tested like the fallback interface -- I am now asking folks to
> draw up test cases for their new code. You can however wait until I review your
> code.
>
> Luis