Re: [PATCH] firmware: add firmware to new device's devres list for second time cache

From: Luis R. Rodriguez
Date: Wed Oct 04 2017 - 17:17:20 EST

On Tue, Aug 22, 2017 at 03:52:46PM +0800, Kai-Heng Feng wrote:
> Currently, firmware will only be chached if assign_firmware_buf() gets
> called.

True, but also more importantly we peg the fw cache to the device via devres
*iff* the firmware actually was found. We do this so that we don't try to look
for bogus firmwares or firmwares which we currently do not have on our
filesystem (consider a driver that uses a series of revisions of firmwares).

> When a device loses its power or a USB device gets plugged to another
> port under suspend, request_firmware() can still find cached firmware,
> but firmware name no longer associates with the new device's devres.
> So next time the system suspend, those firmware won't be cached.

Gah, its a bit more complicated than that. During suspend we call out to
request firmware proactively for all firmwares in our fw cache. The
fw cache is used simply to fetch the caches for firwmares during suspend
so that on resume they exist to avoid races against the filesystem. It
however uses the same functionality as the batched firwmare feature, which
in turn is used to share one buf over different requests.

If a device is unplugged its not clear to me why the old cache would
not work for the new one as its all shared, so the only thing that I
can think of is if the old device being disconnected is processed first,
and therefore releases the old cache, so when the new device is processed
it does not use the new cache. This should still not be an issue though,
unless of course real races happen with the filesystem.

As of recently though we have had new findings which indicate that the
old UMH lock was causing issues on resume on BT devices which *only*
needed firmware on resume, but not on boot, so their first fw cache
was not generated. That issue can resemble this one, in that no cache
can be present, and a race happens on resume.

The old UMH lock then was causing a failure on resume, and one of the
solutions which could have worked was a proactive "hey set this cache
up for me" even though the device didn't need the firmware. This
is no longer needed given that the UMH lock stuff is gone from
direct FS lookups and the issue should no longer be present.

That is, Linus' revert of commit f007cad159e99fa2acd3b2e9364fbb32ad28b971
("Revert "firmware: add sanity check on shutdown/suspend") I believe
should fix this issue.

I'm actually inclined to remove the fw cache stuff as I no longer see
the advantage of it given we are doing FS lookups and I see no races
possible anymore.

> Hence, we should add the firmware name to the devres when the firmware
> is found in cache, to make the firmware cacheable next time.
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> ---
> drivers/base/firmware_class.c | 4 ++++
> 1 file changed, 4 insertions(+)
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index bfbe1e154128..a99de34e3fdc 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -1177,6 +1177,10 @@ _request_firmware_prepare(struct firmware **firmware_p, const char *name,
> ret = fw_lookup_and_allocate_buf(name, &fw_cache, &buf, dbuf, size);
> + /* device might be a new one, add it to devres list */
> + if (ret == 0 || ret == 1)
> + fw_add_devm_name(device, name);
> +

Even if this was correct notice we have requests for which a FW cache is not
desired -- see FW_OPT_NOCACHE, and the above does not respect it.

Given all the above, can you test with a kernel which has
commit f007cad159e99fa2acd3b2e9364fbb32ad28b971 and tell me if you
still see the issue?