Re: [PATCH 4.13 20/27] Revert "firmware: add sanity check on shutdown/suspend"
From: Linus Torvalds
Date: Wed Sep 13 2017 - 00:11:55 EST
On Tue, Sep 12, 2017 at 5:47 PM, Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
>> If reverting this commit please consider reverting also commit
>> 06a45a93e7d34a ("firmware: move umh try locks into the umh code").
>
> Ok, I can queue that revert up in my tree and will send it to Linus once
> 4.14-rc1 is out.
I want to see a _reason_ for that revert. The two have absolutely
nothing to do with each other., Reverting one is *not* a reason for
reverting the other.
Commit 06a45a93e7d34a seems to be a cleanup. The arguments in
06a45a93e7d3 ("firmware: move umh try locks into the umh code") seem
valid, and there's no real reason to worry about that FW_OPT_NOWAIT
etc for the direct-from-filesystem loading. That's simply not
sensible.
That whole FW_OPT_NOWAIT thing only makes sense for the actual
user-mode helper, so that commit actually seems to move the testing
and the logic to a place that really does make sense.
So why would we revert a commit that makes SENSE?
In contrast, the commit that already got reverted, added random
locking state logic associated with a callback that didn't actually
make any sense in that context.
Honestly, what we need here is less "let's do random changes", and
more "have actual reasons for those changes".
For example, I'm the first to tell people that they shouldn't just try
to load firmware from random contexts. If you're bringing up a device,
loading firmware might depend on *another* device that hasn't been
brought up yet, and you might end up with essentially a deadlock
waiting for IO from another device that just isn't going to happen.
But we shouldn't fix that by then adding a flag that gets set
asynchronously by some random callback, and that then causes a
workqueue entry that *could* sleep to just fail. There's no "root
cause logic" there. That's the wrong kind of random change. There may
be a reason for it in some situation, but in another situation it just
fails for no good reason.
For example, what might be senseible is to add a warning that tries to
verify that people do *not* do firmware loads from the ->resume()
callbacks. But then it should literally check *that*: it could do
something like
WARN_ON_ONCE(current == resume_thread, "Firmware loading
called synchronously during resume");
or whatever, exactly because it's obviously *not* ok to block the same
process that is going to resume all the other devices that might be
*needed* for the firmware loading.
But on the other hand, if somebody then does an independent thread to
resume their firmware, we could just block to wait for it - it
wouldn't be the same kind of chicken-and-egg issue with IO at resume
time.
So instead of "arbitrary rules", there should be things that actually
make sense.
The commit that Luis now argues for _also_ reverting makes a lot of
sense to me to keep. I'm not seeing why that should be reverted, when
the _only_ reason seems to be some spiteful "well, if you reverted one
commit, you should randomly revert another one too".
Linus