Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

From: Luis R. Rodriguez
Date: Fri May 26 2017 - 15:40:20 EST


On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
> On 25 May 2017 at 06:13, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> >>>
> >>> Can you give a simple example of what's going on and why it matters?
> >>>
>
>
> Here is the use case in which I ran into this problem.
>
> I have a driver which does request_firmware() when a write() is done
> to a sysfs file.
>
> The write() was being done by an android init script (with the init
> interpreter "write" command).
> init, of course, forks lots of processes and some of the children die.
>
> So the scenario was the following:
>
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()

No no, this is not how the fallback system works.

Please have a look at the latest documentation revamp [0], in particular
the fallback mechanism [1], hopefully this clarifies other details you might
like to know about this first part.

Please note I *highly* encourage relying on the uevent fallback mechanism,
given the custom fallback mechanism has its own quirks and from my own review
I'd be pretty wary about using it.

[0] https://www.kernel.org/doc/html/latest/driver-api/firmware/index.html
[1] https://www.kernel.org/doc/html/latest/driver-api/firmware/fallback-mechanisms.html

Here is the correct summary:

1) Driver issues request_firmware()
2) firmware_class looks for the firmware on the local filesystem, if it
does not find it and the fallback mechanism is enabled
firmware_class issues uevent for the device that requested the firmware
3) Userspace is expected to be monitoring for these uevents and based on
them should only *then* try a write() on the respective sysfs file.
4) The kernel actually uses swait_event_interruptible_timeout() as of
v4.10 due to commit 5b029624948d6 ("firmware: do not use fw_lock for
fw_state protection"), before it used wait_for_completion_interruptible_timeout().
It uses this to *wait* for the sysfs write to complete, so can be
woken up by it.

Please note we just uncovered another issue on firmware_class reported by
Nicolas when multiple cards are used and a signal of completion on sucesss or
error is not issued [2] for which John Ewalt has suggested a fix for [3].
I just reviewed this yesterday and determiend the swake_up()/swake_up_all()
fix is one atomic fix for 5b029624948d6 which moved us to swait given otherwise
we won't wake up all waiters on the same file. The wait stuff is used for
other means not well documented than the sysfs interface: if multiple
requests come in for the same file we batch up these requests as one and
wait for a signal so we can rely on the same file fetch to write to the
pending request. Unfortunately I have discovered yesterday we have quite a
bit of fail paths that never sent the swake_up_all() (this would be
a completion on kernels older than v4.10) on *many* error paths. Note
though there is only one wait used and it uses:

swait_event_interruptible_timeout() >= v4.10
wait_for_completion_interruptible_timeout() older

Since this is interruptible, and has a timeout it does mean if we have
a reboot the wait will be killed.

Since this wait also has a timeout it means we will have an end if
userspace never does anything.

5) Two sysfs files are exposed:

/sys/$DEVPATH/loading
echo 1 > loading # starts load
echo 0 > loading # indicates load is done
# asking to load data to driver
echo -1 > loading # indicates an error occured

For detail see firmware_loading_store(), note we check for fw_state_is_aborted()
before proceeding.

/sys/$DEVPATH/data

For details see firmware_data_read(), note it checks for fw_state_is_done()
early on before proceeding. fw_state_is_done() includes an abort check:

static inline bool __fw_state_is_done(enum fw_status status)
{
return status == FW_STATUS_DONE || status == FW_STATUS_ABORTED;
}

[2] https://bugzilla.kernel.org/show_bug.cgi?id=195477
[3] https://bugzilla.kernel.org/attachment.cgi?id=256493&action=diff&collapsed=&headers=1&format=raw

> 5) A child dies and raises SIGCHLD

What child? The process doing the write() ?

If we have userspace doing a write() on the sysfs interface then
firmware_loading_store() gets kicked off, it should actually try to complete,
unless the singal was processed early in which case the write would bail early
on fw_state_is_done() check.

Otherwise I suppose the kernel can send a signal pending to the swait but
its not clear under what conditions exactly.

> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal

That's for kernels older than v4.10, on >= v4.10 that would be
swait_event_interruptible_timeout() and I confirm -ERESTARTSYS can
be sent as a return value.

> 7) request_firmware() [before this patch] translated that to -EAGAIN

Right, this is today's default error if we fail on the wait. This error would
be returned to the *driver*. This is returned on _request_firmware_load() and
sent back to the request_firmware_*() call.

Martin seems to be arguing -ERESTARTSYS should be sent back instead given
it is what the wait returned after all.

> 8) The driver (in my case) ignored this [because the firmware was not
> critical - it was for checking if a microcontroler was up to date]
> (but it could have returned it to userspace, same problem)

I agree that using -EAGAIN for a signal is incorrect and we should propagate
something more reasonable to the driver, but also given that firmware loading
is typically a bad idea on init but is acceptable on probe, it could mean a
driver propagates this to userspace.

Another issue to consider here is drivers using request_firmware_nowait() may
not get an actual error passed down given the lack of semantics of the
firmware_class. This will be fixed later on the driver data API but for now
it means async callbacks can only infer what might have happened. The callback
must be revised in this light to ensure some sanity check is done of some sort.

So to say that the driver ignored it is saying that the sync call was used
for sure and we know that it ignored -EAGAIN. Is that the case ?

If not what driver was used that ignored -EAGAIN? If it was an async callback
no error value is passed and what you are dealing with is more of an issue
with lack of semantics in the callback very likely and the inability to
deal with error respectively.

So we have three separate issues here:

1) A driver's possible lack of propagation errors
2) The actual error returned by firmware_class and semantics on async callback
3) What errors we should send on syfs files if a file if interrupted or error

The gruseome TLDR details below.

=========================================================

1) A driver's possible lack of propagation errors

Given 2) drivers should take care to be pedantic about how they deal with
errors. In particular if an async callback is used, a signal received
may mean the callback can only check for NULL, and it can only infer if
an error occurred. If the driver deals with daisy chaining files it might
mean it could skip over files without treating fatal errors accordingly.

2) The actual error returned by firmware_class and semantics on async callback

The async callbacks only get NULL on the firmware passed if an error was
detected, it cannot know if a specific error actually occurred. Currently only
sync calls can get the actual -EAGAIN (and if we stich this the respective
-ERESTARTSYS). This will be fixed with the driver data API, as it allows
callbacks to get the actual error passed.

Also, since async callbacks can be used on probe, but we don't want to delay
boot and since init + probe() are batched together we tend to not want to wait
on probe, so this is why many drivers returns right away after an async
firmware call. Even though we have async probe now (and drivers can prefer
this) that still would mean userspace does not get the actual error value. Its
up to drivers to also be able to take down the driver on error if an async call
for fw is used and it fails. The actual error on an async call will only be
available in the future, right now only NULL is passed.

Care with the above lack of current semantics must be taken into account

A user API question still stands as to if we want to send -ERESTARTSYS to
drivers to potentially send back to userspace on driver load given this
can still be sent for sync calls today and we are sending -EAGAIN.

3) What errors we should send on syfs files if a file if interrupted or error

We could capture an error happened on any of the sysfs files by checking
the firwmware status. For instance fw_state_is_aborted() checks on
firmware_loading_store(); fw_state_is_done() on firmware_data_read();
or fw_state_is_done() check on firmware_data_write().

Right now the errors we pass vary. As an example firmware_data_write()
can fail with -ENODEV if for some reason a signal was sent and we
bailed early. That -EPERM on !capable(CAP_SYS_RAWIO) should probably
be changed to -EACCESS. The firmware_data_read() sends -ENODEV() on
error. firmware_loading_store() sends the size of the buffer passed
to us on error -- that seems like an issue.

> The point being that, due to a signal (SIGCHLD) which has nothing to
> do with the firmware loading process, the firmware load was not done.

Who gets SIGCHLD and how does this exactly get to swait_event_interruptible_timeout()
or wait_for_completion_interruptible_timeout() exactly ?

> Also EAGAIN is the same error used if the load request times out so it
> was impossible to distinguish the two cases.

I agree this is an issue.

> ERESTARTSYS is an internal error and is not returned to userspace.
> Instead it is handled by the linux syscall machinery which,

Not accurate, we just never pass it on the firmware_class.c on
swait_event_interruptible_timeout() error. In fact we mask all
errors to -EAGAIN, and if the timeout ran out we just send
-ETIMEDOUT. If the request was aborted early the error -ENOENT
is sent.

> after
> processing the signal either restarts (transpently to userspace) the
> syscall or returns EINTR to userspace (depending if the signal handler
> users SA_RESTART - see man 7 signal)
>
>
> With this patch here is what happens:
>
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()
> 5) A child dies and raises SIGCHLD
> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal
> 7) request_firmware() [with this patch] returns -ERESTARTSYS
> 8) The driver returns -ERSTARTSYS from its sysfs .store method

If you're talking about a custom driver of sorts that triggered
a request_firmware() call (note sync) then yes your earlier description
is accurate and in this case as well the driver specific sysfs interface
can end up returning the same error.

If this file is custom then its up to you to decide what you want
for error on that file, but for the firmware_class.c I do agree on
returning an agreed upon error so drivers can Do The Right Thing (TM).

> 9) The system call machinery invokes the signal handler
> 10) The signal handler does its stuff
> 11) Because SA_RESTART was set the system call is restarted (calling
> the sysfs .store) and we try it all again from step 2

OK so this seems to reveal an internal working of some android
loader of some sort. Thanks.

> Note that, on the the userspace side write() is only called once

The write is to the custom driver trigger which calls request_firmware() ?

> (the
> restart is transparent to userspace which is oblivious to all this)
> The kernel side write() (which calls .store() is called multiple times
> (so that code does need to know about this)

Even if the signal is captured and -ERSTARTSYS is sent down to a custom
sysfs file trigger which called request_firmware(), if -ERSTARTSYS
triggers userspace to try again the old sysfs loader for the fallback
mechanism will not be available anymore, so actually I'd be surprised
if the retry designed by android works here. It *would* have to
retry the whole thing again, starting from the custom trigger.

> >>> ERESTARTSYS and friends are highly magical, and I'm not convinced that
> >>> allowing _request_firmware_load to return -ERESTARTSYS is actually a
> >>> good idea. What if there are system calls that can't handle this
> >>> style of restart that start being restarted as a result?

Given the above explanation I see the concern now. In our case I believe a
valid concern would be sending -ERESTARTSYS to the system call finit_module()
given request_firmware() could be called on probe and if a stupid driver did a
sync call on probe it could potentially send -ERESTARTSYS down.

This could mean we get a loop on finit_module() if a signal is detected
on firmware_request() on every call. Is that fine? Is it expected ?

> If the caller is unable to restart (for example if the driver's
> .store() callback had already done lots of stuff that couldn't be
> undone) it is free to translate -ERSTARTSYS to -EINTR before
> returning.
> But request_frimware() can't know about that.

So you seem to be suggesting the driver's should decide to mask or
not -ERSTARTSYS.

Do we send -ERSTARTSYS on any other paths on finit_module() ?

> >>> Maybe SIGCHLD shouldn't interrupt firmware loading?
>
> I don't think there's a way of doing that without disabling all
> signals (ie using the non interruptible wait variants).

It could be an option if we deem such API is needed.

> It used to be that way (which is why I only ran into this after
> updating from an ancient 3.16 kernel to a slightly less ancient 4.4)

Indeed and back then I think we only ever returned -ENOMEM on error !

> But there are valid reasons for wanting to be able to interrupt
> firmware loading (like being able to kill the userspace helper)

Let's call it the fallback mechanism, as we should. OK so if the
fallback helper is called I fail to see how this is detrimental,
we'd just timeout, no ?

Luis