Re: [PATCH] firmware_class: Avoid pending list corruption

From: Luis R. Rodriguez
Date: Wed Jan 25 2017 - 10:16:12 EST

Next message: Jeff Moyer: "Re: [PATCH v5] DAX: enable iostat for read/write"
Previous message: Sudeep Holla: "Re: [PATCH] firmware: arm_scpi: Add hardware dependencies"
In reply to: Imran Khan: "[PATCH] firmware_class: Avoid pending list corruption"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jan 25, 2017 at 06:15:42PM +0530, Imran Khan wrote:
> Remove firmware buffer from pending list when it is freed.
> Once the buffer is free kmalloc can allocate the same slab
> object for some other user but as the buffer is still there
> in the pending list, we end up with multiple users of the
> same slab object using it in different contexts.

Thanks for your patch! A few things:

0) The fallback mechanism seems to be getting quite a bit of love these days
due to issues creeping up. Some of these are old issues, some are regressions.
We should take advantage of the lib/test_firmware.c driver to help test against
the issues folks have found, build a test case using the script in

tools/testing/selftests/firmware/fw_userhelper.sh
renamed now in the latest development tree [0] to:
tools/testing/selftests/firmware/fw_fallback.sh

and tests against it to ensure we build an atomic test unit against the issue
to easily reproduce, demo the issue, and most importantly avoid further
regressions. Can you add a test case for this against the latest development
changes? Note there there is one patch missing to be applied yet to the series
[1], its a fix which you should merge before testing with the fw_fallback.sh
script. The fix is part of a series [2] which first addresses a test unit case
against the issue expanding on lib/test_firmware.c and then the script to
re-create the issue. Please review that series as an example, if you could do
the same for this issue that would be greatly appreciated.

[0] https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/log/?h=char-misc-next
[1] https://lkml.kernel.org/r/20170123161111.5925-8-mcgrof@xxxxxxxxxx
[2] https://lkml.kernel.org/r/20170123161111.5925-1-mcgrof@xxxxxxxxxx

1) Please inspect carefully the notes on impact of the issue and fix
spelled out in the patch that fixes the issue [1], in particular see how
the commit log describes what configurations are affected, and there is
some effort to help describe the severity of the issue. If you could do
the same for your patch that would be appreciated.

> Signed-off-by: Imran Khan <kimran@xxxxxxxxxxxxxx>
> ---
> drivers/base/firmware_class.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index 4497d26..d09c1aa 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -339,6 +339,9 @@ static void __fw_free_buf(struct kref *ref)
> (unsigned int)buf->size);
>
> list_del(&buf->list);
> +#ifdef CONFIG_FW_LOADER_USER_HELPER
> + list_del(&buf->pending_list);
> +#endif
> spin_unlock(&fwc->lock);
>
> #ifdef CONFIG_FW_LOADER_USER_HELPER

And finally patch review!

The pending_list is only used for the fallback mechanism. The fallback
mechanism uses a state machine to help us keep track of the state of
the buffer since we are allowing userspace to take time to fill it up
or cancel a request. The kernel can also cancel the request if we run
out of memory as we setup things, reboot, or suspend. Lastly upon success
the kernel will list_del_init(&fw_buf->pending_list) -- refer to
firmware_loading_store() when a 0 is echo'd in.

Instead of blindly list_del()'ing the buf I'd like to understand where we
are missing a list_del for the rest of the cases to consider. When we
cancel (see firmware_loading_store() when a -1 is echo'd in) we call
fw_load_abort() which will list_del_init(&buf->pending_list) for us
provided the state is not aborted (fw_state_is_done()).

Upon shutdown fw_shutdown_notify() will also call __fw_load_abort() for all list
elements. Upon suspend kill_requests_without_uevent() will fw_load_abort()
only on requests which do not rely on uevents, this is because upon suspend
firmware_class.c uses a firmware cache for requests which do rely on uevents
for a fallback mechanism. We *cancel* non-uevents request to prevent stalling
suspend.

We *add* an element into the pending list only once we know we were able to
create the temporary device used for the sysfs load on _request_firmware_load().
If the initial setup fails we call fw_load_abort().

Given your patch then when do we not remove the request from the pending list
when we also fw_free_buf()?

It would seem two cases:

o release_firmware() -- but note that the state machine for the firmware
should have coevered all cases: cancellation, successful loading
o upon resume: __device_uncache_fw_images() -- likewise

I cannot see where the issue is then, could the issue have been related to the
state machine issue which we now have a fix for [1] ? Could you test / confirm ?

Luis

Next message: Jeff Moyer: "Re: [PATCH v5] DAX: enable iostat for read/write"
Previous message: Sudeep Holla: "Re: [PATCH] firmware: arm_scpi: Add hardware dependencies"
In reply to: Imran Khan: "[PATCH] firmware_class: Avoid pending list corruption"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]