Re: [PATCH v4 1/4] firmware: Move umh locking code into fw_load_from_user_helper()

From: Ming Lei
Date: Thu Sep 08 2016 - 21:22:55 EST


On Fri, Sep 9, 2016 at 4:11 AM, Luis R. Rodriguez <mcgrof@xxxxxxxxxx> wrote:
> On Thu, Sep 08, 2016 at 11:37:54PM +0800, Ming Lei wrote:
>> On Wed, Sep 7, 2016 at 4:45 PM, Daniel Wagner <wagi@xxxxxxxxx> wrote:
>> > From: Daniel Wagner <daniel.wagner@xxxxxxxxxxxx>
>> >
>> > When we load the firmware directly we don't need to take the umh
>> > lock.
>>
>> I am wondering if it can be wrong.
>
> If you disable the firmware UMH why would we need to lock if the lock is being
> shown only used for the firmare UMH ?
>
>> Actually in case of firmware loading, the usermode helper lock doesn't
>> only mean the user helper is usable, and it also may serve to mark the
>> filesystem/block device is ready for firmware loading, and of couse direct
>> loading need fs/block to be ready too.
>
> Yes but that's a race I've identified a while ago present even if you use initramfs *and*
> use kernel_read_file_from_path() on any part of the kernel [0], I proposed a possible

Actualy I mean the situation of suspend vs. resume, and some drivers
still may not benefit from firmware loading cache when requesting loading
in .resume(), at that time it is still too early for direct loading.
With UMH lock,
we can get a warning or avoid the issue.


> solution recently [1], given tons of folks were *complaining* about this but despite there
> being one solution proposed [2] it was still incorrect, as only *userspace* can know

In case of initramfs and built-in drivers, it isn't a surprise to see this race,
and that shouldn't be a common case. But for suspend vs. resume, it can be
true.


> for sure when your critical filesystems are mounted. Since we now have a shared
> "read file fs" helper kernel_read_file_from_path() it implicates the race is possible
> elsewhere as well.
>
> The race is present given you simply cannot know when the root filesystem
> (consider a series of pivot_root() calls, hey -- anyone can do that, who are we to
> tell them they can't), or in this particular case importance, only considering firmware,
> when /lib/firmware/ is ready. In short I am saying this whole race thing is a
> bigger issue, and as much as Linus hates my proposed solution I have not heard
> any proposal alternatives to address this properly, not even clarifications on
> what our expectations for drivers then are if they use kernel_read_file_from_path()
> early on their driver code.
>
> The original goal of the usermode helper lock came can be traced back in
> time via a144c6a ("PM: Print a warning if firmware is requested when task
> are frozen) when Rafael noticed drivers were calling to request firmware
> on *resume* calls! Why would that be an issue? It was because of the stupid
> delays incurred on resume when firmware *was not found* __due__ to the stupid
> user mode helper timeout delay and people believing this often meant users
> confusing resume *stalling* for resume was *broken! Later b298d289c7
> ("PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()")
> addressed races. That code in turn was later massaged into shape to better
> accept direct FS loading via commit 4e0c92d015235 ("firmware: Refactoring for
> splitting user-mode helper code").
>
> So for a while we've been nagging driver developers to not add request_firmware()
> calls on resume calls. In fact the drivers/base/firmware_class.c code *kills*
> firmware UMH calls when we go to suspend for the firmware cache, as such even
> on suspend its a stupid idea to use the UMH, not only because it can
> stall suspend but *also* because we kill these UMH calls. Its why I recently
> proposed removing the possibility to call the firmware UMH from the firmware
> cache [3]. If this patch is wrong please do chime in!
>
> So, as I see it the user mode helper lock should have *nothing* to do with
> the race between reading files from the kernel and having a path ready. If
> it was *used* for that, that was papering over the real issue. Its no
> different of a hack for instance as if a driver using request_firmware() tried
> to use async probe to avoid the same race. Yes it will help, but no, it does
> not mean it is deterministically safe.
>
> Reviewing commit 247bc03742545 ("PM / Sleep: Mitigate race between the freezer
> and request_firmware()") which originally extended umh state machine from
> just being enabled/disabled, with the concepts of UMH_ENABLED, UMH_FREEZING,
> UMH_DISABLED -- its goal was to prevent UMH uses during suspend. So -- the
> "UMH lock" on firmware was actually added to help avoid races between freezing
> and request_firmware(). We should not re-use UMH status notifiers when the
> firmware UMH is disabled for the same concepts -- if we needed such a concept
> then we should take this out from UMH code and generalize it.
>
> In fact this shows there's still an issue here for UMH code as well. The
> usermodehelper_enable() call
>
> rest_init() --> kernel_thread(kernel_init, NULL, CLONE_FS); -->
> kernel_init --> kernel_init_freeable() --> wait_for_completion(&kthreadd_done);
>
> So after kernel_init_freeable() kthreadd_done should be done but note that
> only userspace will know what partition set up to use. Shortly after
> this though in kernel_init_freeable() we call prepare_namespace() so if
> initramfs was set up its set up after.
>
> Meanwhile driver's inits are called via do_initcalls() called from do_basic_setup()
> and that is called within kernel_init_freeable() *prior* to prepare_namespace().
>
> So calling usermodehelper_enable() doesn't necessarily ensure that the paths
> for the UMH used will actually work either. This path also has a race.
>
> We need to address both things then:
>
> 1) *freeze* races / stalls
> 2) userspace paths ready for whatever the kernel needs to read off of the
> filesystem
>
> These issues are not particular to firmware -- this applies to all
> kernel_read_file_from_path() and as is being revealed now the UMH code. It gets
> me wondering if we have any shared code possible between UMH and
> kernel_read_file_from_path().
>
> For 1) we could just generalize the code.
>
> For 2) I proposed a solution as RFC although some hate it, my latest preference
> would be either *declare* these uses early reads clearly as not supported or
> have a proper init level that does guarantee to be safe against the userspace
> paths.
>
> I'm all ears for alternatives.
>
> [0] https://marc.info/?t=147286207700002&r=1&w=2
> [1] http://lkml.kernel.org/r/CAB=NE6UBRa0K7=PomJzKxsoj4GzAqkYrkp=O+UfVvu2fwM25pA@xxxxxxxxxxxxxx
> [2] https://lkml.org/lkml/2014/6/15/47
> [3] https://lkml.kernel.org/r/1473208930-6835-6-git-send-email-mcgrof@xxxxxxxxxx
>
> Luis