On Sat, May 20, 2017 at 01:46:56AM -0500, yi1.li@xxxxxxxxxxxxxxx wrote:I am studying at the firmware caching codes and have to say it's very complicated. :-( Here are some questions:
From: Yi Li <yi1.li@xxxxxxxxxxxxxxx>Good stuff, this series is looking good and very easy to read ! Only thing
Changes in v2:
- Rebase to Luis R. Rodriguez's 20170501-driver-data-try2
branch
- Expose DRIVER_DATA_REQ_NO_CACHE flag to public
driver_data_req_params structure, so upper drivers can ask
driver_data driver to bypass the internal caching mechanism.
This will be used for streaming and other drivers maintains
their own caching like iwlwifi.
- Add self test cases.
Yi Li (3):
firmware_class: move NO_CACHE from private to driver_data_req_params
iwlwifi: use DRIVER_DATA_REQ_NO_CACHE for driver_data
test: add no_cache to driver_data load tester
drivers/base/firmware_class.c | 16 ++++-----
drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 2 ++
include/linux/driver_data.h | 4 +++
lib/test_driver_data.c | 43 +++++++++++++++++++++++--
tools/testing/selftests/firmware/driver_data.sh | 36 +++++++++++++++++++++
5 files changed, 89 insertions(+), 12 deletions(-)
though -- the cache test is just setting up the cache and ensuring it gets set,
it doesn't really *test* the cache is functional. Can you devise a test which
does ensure the cache is functional ?
We use the cache upon suspend to cache the firmware so that upon resume a
request will use that cache, to avoid the file lookup on disk. Doing a test
with qemu suspend + resume is possible but that requires having access to
the qemu monitor interface and doing something like this to trigger a wakeup:
echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl
where /dev/pts/7 is assumed to be the PTY. This is rather complex for ksefltest,
so another option is to add a debug mode debugfs interface for firmware_class where
we can force-enable the cache mechanism without actually this being prompted by a
suspend/hibernate. Then we can just toggle this bit from a testing perspective and
excercise the caching mechanism.
So if you look at fw_pm_notify()
switch (mode) {
case PM_HIBERNATION_PREPARE:
case PM_SUSPEND_PREPARE:
case PM_RESTORE_PREPARE:
/*
* kill pending fallback requests with a custom fallback
* to avoid stalling suspend.
*/
kill_pending_fw_fallback_reqs(true);
device_cache_fw_images();
disable_firmware();
break;
kill_pending_fw_fallback_reqs(true), device_cache_fw_images(), andThis only applies to the devices have PM enabled.
disable_firmware() could be folder into a helper, then a debugfs interface
could kick that into action to put us cache mode as the
device_cache_fw_images() changes the cache state to FW_LOADER_START_CACHE, and
when this is done you'll notice that assign_firmware_buf() does:
/*
* After caching firmware image is started, let it piggyback
* on request firmware.
*/
if (!driver_data_param_nocache(&data_params->priv_params) &&
buf->fwc->state == FW_LOADER_START_CACHE) {
if (fw_cache_piggyback_on_request(buf->fw_id))
kref_get(&buf->ref);
}
Which adds an incoming request to the cache. The first request that adds this cache
entry would be triggered by device_cache_fw_images() after the cache state is enabled:
static void device_cache_fw_images(void)
{
...
fwc->state = FW_LOADER_START_CACHE;
dpm_for_each_dev(NULL, dev_cache_fw_image);
...
}
Subsequent requests then lookup for the cache through _request_firmware_prepare()
when fw_lookup_and_allocate_buf() is called. __fw_lookup_buf() really should be
renamed to something that reflects this is a cache lookup. In fact if you find
anything else that needs renaming to make it clear please feel free to send patches
for it.
We want to test that when caching is enabled, the cache is actually used.
Note that disable_firmware() above on the notifier *does* disable subsequent firmware
lookups but this is only *if* the lookup fails with _request_firmware_prepare():
_request_firmware(const struct firmware **firmware_p, const char *name,
struct driver_data_params *data_params,
struct device *device)
{
struct firmware *fw = NULL;
int ret;
if (!firmware_p)
return -EINVAL;
if (!name || name[0] == '\0') {
ret = -EINVAL;
goto out;
}
ret = _request_firmware_prepare(&fw, name, device, data_params);
if (ret <= 0) /* error or already assigned */
goto out;
if (!firmware_enabled()) {
WARN(1, "firmware request while host is not available\n");
ret = -EHOSTDOWN;
goto out;
}
...
}
The idea is that _request_firmware_prepare() will have picked up the cache and
enabled use of that while the infrastructure for disk lookups is disabled. The
caching effect is lifted later on the same notifier fw_pm_notify() and it also
schedules a clearing of the cached firmwares with device_uncache_fw_images_delay().
To me this last part smells like a possible source of issue (not sure) if we mightAgree, it might create an issue if the system is get into restore_prepare again before the device_uncache_fw_images_delay clear the cache, why we need the 10 * MSEC_PER_SEC delay? In theory, fwc->name_lock should protect the case though.
suspend/resume a lot in short period of time, this theory could be tested by toggling
on/of this debugfs interface I suggested while having requests of different types
blast in.
Understand the need to test the firmware caching part. For non-caching test case, will it be enough if we can test that the noncache setting will ban the firmware name be added to the fwc->fw_names list?
This is the sort of testing which would really help here.
Likewise, if you are extending functionality please consider ways to break it :)
and test against it. Please think about these things carefully, its what will
change the stability for the better long term of our loader infrastructure.
Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-fpga" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html