Re: [PATCH v4] firmware_loader: fix use-after-free in firmware_fallback_sysfs
From: Luis Chamberlain
Date: Thu Jun 10 2021 - 18:33:09 EST
On Tue, May 18, 2021 at 8:20 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
> On Tue, May 18, 2021 at 09:29:20PM +0530, Anirudh Rayabharam wrote:
> > This use-after-free happens when a fw_priv object has been freed but
> > hasn't been removed from the pending list (pending_fw_head). The next
> > time fw_load_sysfs_fallback tries to insert into the list, it ends up
> > accessing the pending_list member of the previoiusly freed fw_priv.
> >
> > The root cause here is that all code paths that abort the fw load
> > don't delete it from the pending list. For example:
> >
> > _request_firmware()
> > -> fw_abort_batch_reqs()
> > -> fw_state_aborted()
> >
> > To fix this, delete the fw_priv from the list in __fw_set_state() if
> > the new state is DONE or ABORTED. This way, all aborts will remove
> > the fw_priv from the list. Accordingly, remove calls to list_del_init
> > that were being made before calling fw_state_(aborted|done)().
> >
> > Also, in fw_load_sysfs_fallback, don't add the fw_priv to the list
> > if it is already aborted. Instead, just jump out and return early.
> >
> > Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback")
> > Reported-by: syzbot+de271708674e2093097b@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Tested-by: syzbot+de271708674e2093097b@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: Anirudh Rayabharam <mail@xxxxxxxxxxxxx>
> > ---
> >
> > Changes in v4:
> > Documented the reasons behind the error codes returned from
> > fw_sysfs_wait_timeout() as suggested by Luis Chamberlain.
> >
> > Changes in v3:
> > Modified the patch to incorporate suggestions by Luis Chamberlain in
> > order to fix the root cause instead of applying a "band-aid" kind of
> > fix.
> > https://lore.kernel.org/lkml/20210403013143.GV4332@xxxxxxxxxxxxxxxxxxx/
> >
> > Changes in v2:
> > 1. Fixed 1 error and 1 warning (in the commit message) reported by
> > checkpatch.pl. The error was regarding the format for referring to
> > another commit "commit <sha> ("oneline")". The warning was for line
> > longer than 75 chars.
> >
> > ---
> > drivers/base/firmware_loader/fallback.c | 46 ++++++++++++++++++-------
> > drivers/base/firmware_loader/firmware.h | 6 +++-
> > drivers/base/firmware_loader/main.c | 2 ++
> > 3 files changed, 40 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
> > index 91899d185e31..f244c7b89ba5 100644
> > --- a/drivers/base/firmware_loader/fallback.c
> > +++ b/drivers/base/firmware_loader/fallback.c
> > @@ -70,7 +70,31 @@ static inline bool fw_sysfs_loading(struct fw_priv *fw_priv)
> >
> > static inline int fw_sysfs_wait_timeout(struct fw_priv *fw_priv, long timeout)
> > {
> > - return __fw_state_wait_common(fw_priv, timeout);
> > + int ret = __fw_state_wait_common(fw_priv, timeout);
> > +
> > + /*
> > + * A signal could be sent to abort a wait. Consider Android's init
> > + * gettting a SIGCHLD, which in turn was the same process issuing the
> > + * sysfs store call for the fallback. In such cases we want to be able
> > + * to tell apart in userspace when a signal caused a failure on the
> > + * wait. In such cases we'd get -ERESTARTSYS.
> > + *
> > + * Likewise though another race can happen and abort the load earlier.
> > + *
> > + * In either case the situation is interrupted so we just inform
> > + * userspace of that and we end things right away.
> > + *
> > + * When we really time out just tell userspace it should try again,
> > + * perhaps later.
> > + */
> > + if (ret == -ERESTARTSYS || fw_state_is_aborted(fw_priv))
> > + ret = -EINTR;
> > + else if (ret == -ETIMEDOUT)
> > + ret = -EAGAIN;
>
>
> Shuah has explained to me that the only motivation on her part with
> using -EAGAIN on commit 0542ad88fbdd81bb ("firmware loader: Fix
> _request_firmware_load() return val for fw load abort") was to
> distinguish the error from -ENOMEM, and so there was no real
> reason to stick to -EAGAIN. Given -EAGAIN is used typically to
> ask user to retry, but it makes no sense in this case since the
> sysfs interface is ephemeral, I think we should do away with it
> and document this rationale.
>
> I think we should stick to use -ETIMEDOUT. Its more telling of what
> happened. And so I think just removing the check should do it, but
> augmenting the comment should suffice.
>
> Since this change is already big, it would be good for this other
> change to go in as a separate change. If you can test to ensure the
> -ETIMEDOUT does indeed get propagated that'd be appreciated.
>
> Otherwise looks good. Thanks for your patience!
Anirudh, did you get a chance to test?