In SCST (http://scst.sf.net) in some cases IO can be submitted asynchronously. This is possible for pass-through (i.e. using scsi_execute_async()) and BLOCKIO (i.e. using direct bio interface, see blockio_exec_rw() in http://scst.svn.sourceforge.net/viewvc/scst/trunk/scst/src/dev_handlers/scst_vdisk.c?revision=614&view=markup) backend. For them there's no need to have a per device pool of threads, one or more global thread(s) can perfectly do all the work. But it is very desirable for performance that all the IO is submitted in a dedicated IO context for each initiator (i.e. client), which originated it. I.e. commands from initiator 1 submitted in IO context IOC1, from initiator 2 - IOC2, etc. Most likely, the same approach would be very useful for NFS server as well.
To achieve that it is necessary to have a possibility to switch IO context of the threads on the fly. I tried to implement that (see the attached patch), but hit BUG_ON(!cic->dead_key) in cic_free_func(), when session for initiator with the corresponding IO context was being destroyed by scst_free_tgt_dev(). At that point it was guaranteed that there was no outstanding IO with this IO context.
So, I had to go to a more defensive approach to have for each pool of threads, including threads for async. IO, a dedicated IO context, which is currently implemented.
Could you advice please what was going wrong? What should I do to achieve what's desired?
I think the problem may be that cfq expects cfq_exit_io_context()
to be called before the last reference to an io context is put.
Since cfq_exit_io_context() is called during process exit, and AFAICT
you are not calling exit_io_context() on the given ioc, cfq finds it
in an incorrect state.
I haven't seen the rest of the code, so I may be wrong, but I suppose
that a better approach would be to use CLONE_IO to share io contexts,
if possible.