On Tue, Jun 06, 2017 at 02:31:54PM -0500, Li, Yi wrote:
We use the cache upon suspend to cache the firmware so that upon resume a
request will use that cache, to avoid the file lookup on disk. Doing a test
with qemu suspend + resume is possible but that requires having access to
the qemu monitor interface and doing something like this to trigger a wakeup:
echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl
BTW if it helps for testing:
https://github.com/mcgrof/kvm-boot
I am studying at the firmware caching codes and have to say it's very
complicated. :-( Here are some questions:
It is! For this reason I tried to document it, have you read this:
https://www.kernel.org/doc/html/latest/driver-api/firmware/firmware_cache.html
This is certainly missing the notes about how some of this is loosely re-used
for possible duplicate requests though, but I think I've mentioned this on
my review of your patches so far.
1. Since device_cache_fw_images invokes dev_cache_fw_image through
dpm_for_each_dev, adding a debugfs driver to kick it can only cache firmware
for those associated with devices which has PM enabled, which do not include
the driver_data_test_device. Any suggestions?
Ah, consider adding PM to driver_data_test_device ?
May be worth updating the documentation bout this requirement too!
2. Look into dev_cache_fw_image function, devres_for_each_res will walk
through the firmware have been loaded before (through assign_firmware_buf ->
fw_add_devm_name) and add to the todo list, eventually it will create the
fw_names list. So in the test driver, we need to load the firmware once
before calling the kick?
Yes, makes sense, given its a firmware cache, it would only make sense to
cache firmware for requests already made.
An important note I made in the documentation which you did not mention
but should be important for your own sanity:
"The firmware devres entry is maintained throughout the lifetime of the device.
This means that even if you release_firmware() the firmware cache will still be
used on resume from suspend."
This means the cache will be tried even if the driver did use release_firmware()
after request_firmware().
static void device_cache_fw_images(void)This only applies to the devices have PM enabled.
{
...
fwc->state = FW_LOADER_START_CACHE;
dpm_for_each_dev(NULL, dev_cache_fw_image);
...
}
Makes sense!
device_uncache_fw_images_delay() will be called for PM_POST_RESTORE, which
means the hibernation restore got error. How about the successful restore
case, calling driver will free its firmware_buf after loading?
As it is on my latest 20170605-driver-data [0] (but should be just as yours if you
used a recent branch of mine), fw_pm_notify() uses:
static int fw_pm_notify(struct notifier_block *notify_block,
unsigned long mode, void *unused)
{
switch (mode) {
...
case PM_POST_SUSPEND:
case PM_POST_HIBERNATION:
case PM_POST_RESTORE:
/*
* In case that system sleep failed and syscore_suspend is
* not called.
*/
mutex_lock(&fw_lock);
fw_cache.state = FW_LOADER_NO_CACHE;
mutex_unlock(&fw_lock);
enable_firmware();
device_uncache_fw_images_delay(10 * MSEC_PER_SEC);
break;
}
}
So it uses also PM_POST_RESTORE. I think that covers it all ? As it
is today we handle the firmware cache for all drivers, it should be
transparent to drivers.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data
To me this last part smells like a possible source of issue (not sure) if we mightAgree, it might create an issue if the system is get into restore_prepare
suspend/resume a lot in short period of time, this theory could be tested by toggling
on/of this debugfs interface I suggested while having requests of different types
blast in.
again before the device_uncache_fw_images_delay clear the cache, why we need
the 10 * MSEC_PER_SEC delay? In theory, fwc->name_lock should protect the
case though.
One possible solution may be to cancel_delayed_work(&fw_cache.work)
or cancel_delayed_work_sync(&fw_cache.work) on suspend, and then
force it to run *right away* so we clear any pending old cache.
One performance issue with this is we are clearing some possible cache we are
about to re-create, so mod_delayed_work() seems more efficient -- provided we
have accounting notes to know no new firmware requests have trickled in since
the last cache build. This seems rather complicated so a first initial approach
to just kill all known cache before suspend seems easy and fair.
BTW such a fix might be a stable candidate if you find a real behavioural
issue with the kernel !
This is the sort of testing which would really help here.Understand the need to test the firmware caching part. For non-caching test
Likewise, if you are extending functionality please consider ways to break it :)
case, will it be enough if we can test that the noncache setting will ban
the firmware name be added to the fwc->fw_names list?
Good question. Let's see... in the future a device might have two requests
with the same firmware, one with a cache enabled and one without it -- I suppose
for now we should perhaps disable a non-cache request if one already exists
with a cache request ? The code for that probably is not there... But other than
this corner case...
Yeah I think that's a reasonable test case. Such a code seems may need a debug
export symbol to allow drivers to query for this info, if so feel free to a
respective debug kconfig entry for it.
Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fpga" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html