Re: [PATCH] zram: do not lookup algorithm in backends table

From: Sergey Senozhatsky
Date: Wed Jun 22 2022 - 11:20:13 EST


On (22/06/22 11:35), Sergey Senozhatsky wrote:
> Always use crypto_has_comp() so that crypto can lookup module,
> call usermodhelper to load the modules, wait for usermodhelper
> to finish and so on. Otherwise crypto will do all of these steps
> under CPU hot-plug lock and this looks like too much stuff to
> handle under the CPU hot-plug lock. Besides this can end up in
> a deadlock when usermodhelper triggers a code path that attempts
> to lock the CPU hot-plug lock, that zram already holds.

And we think that we (not exactly "we", our partners) actually
see a deadlock. It goes something like this:

- path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto
and waits for modprobe to finish

disksize_store
zcomp_create
__cpuhp_state_add_instance
__cpuhp_state_add_instance_cpuslocked
zcomp_cpu_up_prepare
crypto_alloc_base
crypto_alg_mod_lookup
call_usermodehelper_exec
wait_for_completion_killable
do_wait_for_common
schedule

- path B. async work kthread that brings in scsi device. It wants to
register CPUHP states at some point, and it needs the CPU hot-plug
lock for that, which is owned by zram.

async_run_entry_fn
scsi_probe_and_add_lun
scsi_mq_alloc_queue
blk_mq_init_queue
blk_mq_init_allocated_queue
blk_mq_realloc_hw_ctxs
__cpuhp_state_add_instance
__cpuhp_state_add_instance_cpuslocked
mutex_lock
schedule

- path C. modprobe sleeps, waiting for all aync works to finish.

load_module
do_init_module
async_synchronize_full
async_synchronize_cookie_domain
schedule

And none can make any progress.

So I think we need to move crypto_alg_mod_lookup()->call_usermodehelper_exec()
out of CPU hot-plug lock and pre-load modules in advance, before we grab the
hot-plug lock.