Hello, again.
On Wed, Jan 04, 2023 at 11:39:47AM -1000, Tejun Heo wrote:
2) rq_qos_add() and blkcg_activate_policy() is not atomic, if
rq_qos_exit() is done before blkcg_activate_policy(),
null-ptr-deference can be triggered.
I'm not sure this part does. I think it'd be better to guarantee that device
destruction is blocked while these configuration operations are in progress
which can be built into blkg_conf helpers.
A bit more explanation:
Usually, this would be handled in the core - when a device goes away, its
sysfs files get shut down before stuff gets freed and the sysfs file removal
waits for in-flight operations to finish and prevents new ones from
starting, so we don't have to worry about in-flight config file operations
racing against device removal.
Here, the problem isn't solved by that because the config files live on
cgroupfs and their lifetimes are not coupled with the block devices'. So, we
need to synchronize manually. And, given that, the right place to do is the
blkg config helpers cuz they're the ones which establish the connection
between cgroup and block layer.
Can you please take a look at the following patchset I just posted:
https://lkml.kernel.org/r/20230105002007.157497-1-tj@xxxxxxxxxx
After that, all these configuration operations are wrapped between
blkg_conf_init() and blkg_conf_exit() which probably are the right place to
implement the synchronization.
Thanks.