Re: [PATCH] firmware: request_firmware() should propagate -ERESTARTSYS

From: Luis R. Rodriguez
Date: Tue May 23 2017 - 15:55:41 EST


On Tue, May 23, 2017 at 04:32:49PM +0200, Martin Fuzzey wrote:
> On 23/05/17 15:31, Greg Kroah-Hartman wrote:
> > On Tue, May 23, 2017 at 03:16:07PM +0200, Martin Fuzzey wrote:
> > > When -ERESTARTSYS is returned by wait_* due to a signal this should
> > > be returned from request_firmware() so that the syscall may be
> > > restarted if necessary.
> > >
> > >
> > > Nice find, should this go to the stable kernels as well?
>
> Yes I think it should.

Thanks for the patch !

Just a bit of more nose diving validating this.

We actually used to send -ENOMEM for a long time always, and now you are
special-casing to allow -ERESTARTSYS -- we we must ask ourselves -- why
not other errors ?

Let us consider what is upstream only please and focus on stable later.

We use:

static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
{
long ret;

ret = swait_event_interruptible_timeout(fw_st->wq,
__fw_state_is_done(READ_ONCE(fw_st->status)),
timeout);
if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
return -ENOENT;
if (!ret)
return -ETIMEDOUT;

return ret < 0 ? ret : 0;
}

What can swait_event_interruptible_timeout() return ? It can return the value
of a timeout or whatever __swait_event_interruptible_timeout() returns.
__swait_event_interruptible_timeout() in turn uses ___swait_event() as
follows:

#define __swait_event_interruptible_timeout(wq, condition, timeout) \
___swait_event(wq, ___wait_cond_timeout(condition), \
TASK_INTERRUPTIBLE, timeout, \
__ret = schedule_timeout(__ret))

So this ultimately use ___swait_event():

/* as per ___wait_event() but for swait, therefore "exclusive == 0" */
#define ___swait_event(wq, condition, state, ret, cmd) \
({ \
struct swait_queue __wait; \
long __ret = ret; \
\
INIT_LIST_HEAD(&__wait.task_list); \
for (;;) { \
long __int = prepare_to_swait_event(&wq, &__wait, state);\
\
if (condition) \
break; \
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
break; \
} \
\
cmd; \
} \
finish_swait(&wq, &__wait); \
__ret; \
})

And prepare_to_swait_event() can return -ERESTARTSYS on signal_pending_state()
otherwise it returns 0 ! So indeed -ERESTARTSYS is possible.

But what about other errors ? Considering the above it would seem then we
actually can only get -ERESTARTSYS or whatever schedule_timeout() returns.
schedule_timeout() is documented indicating it returns 0 when the timer has
expired otherwise the remaining time in jiffies will be returned. It further
clarifies that the return value is guaranteed to be be non-negative. And
___wait_is_interruptible() is:

#define ___wait_is_interruptible(state) \
(!__builtin_constant_p(state) || \
state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \

So this piece of code:
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
break; \
} \

Since ___wait_is_interruptible() is always true for us since we use
__swait_event_interruptible_timeout() the -ERESTARTSYS will always be sent if a
signal was sent.

In all this light then the patch is correct for upstream however let's consider
stable now. At first glance this seems like a fix for an old patch,
respectively commit 0542ad88fbdd81bb ("firmware loader: Fix
_request_firmware_load() return val for fw load abort" by Shuah Khan which was
merged since v3.17, *but* back then just used wait_for_completion() and ignored
any signals here, they were just not part of our semantics. But its important
to note then we always returned -ENOMEM before that patch and then at least
returned -EAGAIN in other cases. The next thing to note is commit
5d47ec02c37ea632398c ("firmware: Correct handling of fw_state_wait() return
value") by Bjorn Andersson. This took place *after* the swait changes. Bjorn
fixed an issue but also forgot to address the special case of -ERESTARTSYS,
given right below fw_state_wait_timeout() on _request_firmware_load() the
return value would be lost. It would seem Bjorn assumed the return value would
be propagated but did not notice the error special casing below which would
loose it.

So before and after Bjorn's changes we were still *trying* to propagate the
-ERESTARTSYS error but it was still lost.

The -ERESTARTSYS from signals was still something we were capturing even prior
to the swait changes, see the kernel commit prior to the swait changes (0430cafcc4fb
"firmware: drop bit ops in favor of simple state machine"), if you git blame there,
will find a series of commits with the -ERESTARTSYS handled... I can trace back
to commit 68ff2a00dbf ("firmware_loader: handle timeout via
wait_for_completion_interruptible_timeout()") as having the -ERESTARTSYS check but
it had lost that error on _request_firmware_load() due to the :

if (retval == -ERESTARTSYS || !retval) {
mutex_lock(&fw_lock);
fw_load_abort(fw_priv);
mutex_unlock(&fw_lock);
}

if (is_fw_load_aborted(buf))
retval = -EAGAIN;
else if (!buf->data)
retval = -ENOMEM;

As noted earlier the above piece of code lost the error because of
0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val for
fw load abort" which was merged since v3.17, but back then it *did not*
propagate the error. So it would be incorrect to say that your patch fixes
commit 0542ad88fbdd81bb by Shuah Khan... We'd have to ask ourselves when
such an error actually became relevant.

It would seem the -ERESTARTSYS signal took effect first via commit 0cb64249ca500
("firmware_loader: abort request if wait_for_completion is interrupted") added
upstream via v4.0 and since it was introduced the error code was lost given the
-EAGAIN overwrite added *earlier* by Shuah Khan when such error codes were not
even relevant. So prior to v4.0 were we not even aborting due to signals.

As such this as far as I can tell this is a fix for a fix for this commit.

So we should use:

Fixes: 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted")
Cc: stable@xxxxxxxxxxxxxxx # 4.0

Also a more reflective subject and commit log would be appreciated, you can add
my Acked-by given I have also now tested it with all the test drivers and
test scripts:

==========================================================================
firmware: fix sending -ERESTARTSYS due to signal on fallback

Commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
is interrupted") added via 4.0 added support to abort the fallback mechanism
when a signal was detected and wait_for_completion_interruptible() returned
-ERESTARTSYS. Although the abort was effective we were unfortunately never
really propagating this error though and as such userspace could not know
why the abort happened.

The error code was always being lost to an even older change, commit
0542ad88fbdd81bb ("firmware loader: Fix _request_firmware_load() return val
for fw load abort") by Shuah Khan which was merged since v3.17. Back then
though we never were capturing these signals or bailing on a signal. After
this change though only only -EAGAIN was being relayed back to userspace
on non-memory errors including signals trying to interrupt our fallback
process.

It only makes sense to fix capturing -ERESTARTSYS since we were capturing
the error but when it was actually effective, since commit 0cb64249ca500
("firmware_loader: abort request if wait_for_completion is interrupted").

Only distributions relying on the fallback mechanism are impacted. An
example issue is on Android, when request_firmware() is called through
the firmware fallback mechanism -- syfs write call, sysfs .store(), and
Android sends a SIGCHLD to fail the write call -- in such cases the fact
that we failed due to a signal is lost.

Fix this and ensure we propagate -ERESTARTSYS so that handlers can whether
or not to restart write calls.

Signed-off-by: Martin Fuzzey <mfuzzey@xxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # 4.0
Acked-by: Luis R. Rodriguez <mcgrof@xxxxxxxxxx>
==========================================================================

> I have already applied a similar patch to my 4.4 tree.

You might also be intersted in commit 2e700f8d85975 ("firmware: fix usermode
helper fallback loading".

> The exact same patch won't apply since the code has changed a bit since.
>
> So I was planning on sending for -stable-4.4 once it's in mainline.

Appreciated, after this is merged of course.

Luis