Re: [PATCH] firmware: request_firmware() should propagate -ERESTARTSYS

From: Luis R. Rodriguez
Date: Wed May 24 2017 - 16:57:05 EST


On Tue, May 23, 2017 at 09:55:33PM +0200, Luis R. Rodriguez wrote:
> On Tue, May 23, 2017 at 04:32:49PM +0200, Martin Fuzzey wrote:
> > On 23/05/17 15:31, Greg Kroah-Hartman wrote:
> > > On Tue, May 23, 2017 at 03:16:07PM +0200, Martin Fuzzey wrote:
> > > > When -ERESTARTSYS is returned by wait_* due to a signal this should
> > > > be returned from request_firmware() so that the syscall may be
> > > > restarted if necessary.
> > > >
> > > >
> > > > Nice find, should this go to the stable kernels as well?
> >
> > Yes I think it should.
>
> Thanks for the patch !
>
> Just a bit of more nose diving validating this.
>
> We actually used to send -ENOMEM for a long time always, and now you are
> special-casing to allow -ERESTARTSYS -- we we must ask ourselves -- why
> not other errors ?
>
> Let us consider what is upstream only please and focus on stable later.
>
> We use:
>
> static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
> {
> long ret;
>
> ret = swait_event_interruptible_timeout(fw_st->wq,
> __fw_state_is_done(READ_ONCE(fw_st->status)),
> timeout);
> if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> return -ENOENT;
> if (!ret)
> return -ETIMEDOUT;
>
> return ret < 0 ? ret : 0;
> }
>
> What can swait_event_interruptible_timeout() return ? It can return the value
> of a timeout or whatever __swait_event_interruptible_timeout() returns.
> __swait_event_interruptible_timeout() in turn uses ___swait_event() as
> follows:
>
> #define __swait_event_interruptible_timeout(wq, condition, timeout) \
> ___swait_event(wq, ___wait_cond_timeout(condition), \
> TASK_INTERRUPTIBLE, timeout, \
> __ret = schedule_timeout(__ret))
>
> So this ultimately use ___swait_event():
>
> /* as per ___wait_event() but for swait, therefore "exclusive == 0" */
> #define ___swait_event(wq, condition, state, ret, cmd) \
> ({ \
> struct swait_queue __wait; \
> long __ret = ret; \
> \
> INIT_LIST_HEAD(&__wait.task_list); \
> for (;;) { \
> long __int = prepare_to_swait_event(&wq, &__wait, state);\
> \
> if (condition) \
> break; \
> \
> if (___wait_is_interruptible(state) && __int) { \
> __ret = __int; \
> break; \
> } \
> \
> cmd; \
> } \
> finish_swait(&wq, &__wait); \
> __ret; \
> })
>
> And prepare_to_swait_event() can return -ERESTARTSYS on signal_pending_state()
> otherwise it returns 0 ! So indeed -ERESTARTSYS is possible.
>
> But what about other errors ? Considering the above it would seem then we
> actually can only get -ERESTARTSYS or whatever schedule_timeout() returns.
> schedule_timeout() is documented indicating it returns 0 when the timer has
> expired otherwise the remaining time in jiffies will be returned. It further
> clarifies that the return value is guaranteed to be be non-negative. And
> ___wait_is_interruptible() is:
>
> #define ___wait_is_interruptible(state) \
> (!__builtin_constant_p(state) || \
> state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \
>
> So this piece of code:
> \
> if (___wait_is_interruptible(state) && __int) { \
> __ret = __int; \
> break; \
> } \
>
> Since ___wait_is_interruptible() is always true for us since we use
> __swait_event_interruptible_timeout() the -ERESTARTSYS will always be sent if a
> signal was sent.
>
> In all this light then the patch is correct for upstream however let's consider
> stable now. At first glance this seems like a fix for an old patch,
> respectively commit 0542ad88fbdd81bb ("firmware loader: Fix
> _request_firmware_load() return val for fw load abort" by Shuah Khan which was
> merged since v3.17, *but* back then just used wait_for_completion() and ignored
> any signals here, they were just not part of our semantics. But its important
> to note then we always returned -ENOMEM before that patch and then at least
> returned -EAGAIN in other cases. The next thing to note is commit
> 5d47ec02c37ea632398c ("firmware: Correct handling of fw_state_wait() return
> value") by Bjorn Andersson. This took place *after* the swait changes. Bjorn
> fixed an issue but also forgot to address the special case of -ERESTARTSYS,
> given right below fw_state_wait_timeout() on _request_firmware_load() the
> return value would be lost. It would seem Bjorn assumed the return value would
> be propagated but did not notice the error special casing below which would
> loose it.
>
> So before and after Bjorn's changes we were still *trying* to propagate the
> -ERESTARTSYS error but it was still lost.
>
> The -ERESTARTSYS from signals was still something we were capturing even prior
> to the swait changes, see the kernel commit prior to the swait changes (0430cafcc4fb
> "firmware: drop bit ops in favor of simple state machine"), if you git blame there,
> will find a series of commits with the -ERESTARTSYS handled... I can trace back
> to commit 68ff2a00dbf ("firmware_loader: handle timeout via
> wait_for_completion_interruptible_timeout()") as having the -ERESTARTSYS check but
> it had lost that error on _request_firmware_load() due to the :
>
> if (retval == -ERESTARTSYS || !retval) {
> mutex_lock(&fw_lock);
> fw_load_abort(fw_priv);
> mutex_unlock(&fw_lock);
> }
>
> if (is_fw_load_aborted(buf))
> retval = -EAGAIN;
> else if (!buf->data)
> retval = -ENOMEM;
>
> As noted earlier the above piece of code lost the error because of
> 0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val for
> fw load abort" which was merged since v3.17, but back then it *did not*
> propagate the error. So it would be incorrect to say that your patch fixes
> commit 0542ad88fbdd81bb by Shuah Khan... We'd have to ask ourselves when
> such an error actually became relevant.
>
> It would seem the -ERESTARTSYS signal took effect first via commit 0cb64249ca500
> ("firmware_loader: abort request if wait_for_completion is interrupted") added
> upstream via v4.0 and since it was introduced the error code was lost given the
> -EAGAIN overwrite added *earlier* by Shuah Khan when such error codes were not
> even relevant. So prior to v4.0 were we not even aborting due to signals.
>
> As such this as far as I can tell this is a fix for a fix for this commit.
>
> So we should use:
>
> Fixes: 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted")
> Cc: stable@xxxxxxxxxxxxxxx # 4.0
>
> Also a more reflective subject and commit log would be appreciated, you can add
> my Acked-by given I have also now tested it with all the test drivers and
> test scripts:
>
> ==========================================================================
> firmware: fix sending -ERESTARTSYS due to signal on fallback
>
> Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
> is interrupted") added via 4.0 added support to abort the fallback mechanism
> when a signal was detected and wait_for_completion_interruptible() returned
> -ERESTARTSYS. Although the abort was effective we were unfortunately never
> really propagating this error though and as such userspace could not know
> why the abort happened.
>
> The error code was always being lost to an even older change, commit
> 0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val
> for fw load abort") by Shuah Khan which was merged since v3.17. Back then
> though we never were capturing these signals or bailing on a signal. After
> this change though only only -EAGAIN was being relayed back to userspace
> on non-memory errors including signals trying to interrupt our fallback
> process.
>
> It only makes sense to fix capturing -ERESTARTSYS since we were capturing
> the error but when it was actually effective, since commit 0cb64249ca500
> ("firmware_loader: abort request if wait_for_completion is interrupted").
>
> Only distributions relying on the fallback mechanism are impacted. An
> example issue is on Android, when request_firmware() is called through
> the firmware fallback mechanism -- syfs write call, sysfs .store(), and
> Android sends a SIGCHLD to fail the write call -- in such cases the fact
> that we failed due to a signal is lost.
>
> Fix this and ensure we propagate -ERESTARTSYS so that handlers can whether
> or not to restart write calls.
>
> Signed-off-by: Martin Fuzzey <mfuzzey@xxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # 4.0
> Acked-by: Luis R. Rodriguez <mcgrof@xxxxxxxxxx>
> ==========================================================================
>
> > I have already applied a similar patch to my 4.4 tree.
>
> You might also be intersted in commit 2e700f8d85975 ("firmware: fix usermode
> helper fallback loading".
>
> > The exact same patch won't apply since the code has changed a bit since.
> >
> > So I was planning on sending for -stable-4.4 once it's in mainline.
>
> Appreciated, after this is merged of course.

I'll just send this with the commit log change myself.

Luis