On Thu, 25 Feb 2021 23:19:31 +0100,
Anton Yakovlev wrote:
On 25.02.2021 21:30, Takashi Iwai wrote:> On Thu, 25 Feb 2021 20:02:50
+0100,
Michael S. Tsirkin wrote:
On Thu, Feb 25, 2021 at 01:51:16PM +0100, Takashi Iwai wrote:
On Thu, 25 Feb 2021 13:14:37 +0100,
Anton Yakovlev wrote:
[snip]
Takashi given I was in my tree for a while and I planned to merge
it this merge window.
Hmm, that's too quick, I'm afraid. I see still a few rough edges in
the code. e.g. the reset work should be canceled at the driver
removal, but it's missing right now. And that'll become tricky
because the reset work itself unbinds the device, hence it'll get
stuck if calling cancel_work_sync() at remove callback.
Yes, you made a good point here! In this case, we need some external
mutex for synchronization. This is just a rough idea, but maybe
something like this might work:
struct reset_work {
struct mutex mutex;
struct work_struct work;
struct virtio_snd *snd;
bool resetting;
};
static struct reset_work reset_works[SNDRV_CARDS];
init()
// init mutexes and workers
virtsnd_probe()
snd_card_new(snd->card)
reset_works[snd->card->number].snd = snd;
virtsnd_remove()
mutex_lock(reset_works[snd->card->number].mutex)
reset_works[snd->card->number].snd = NULL;
resetting = reset_works[snd->card->number].resetting;
mutex_unlock(reset_works[snd->card->number].mutex)
if (!resetting)
// cancel worker reset_works[snd->card->number].work
// remove device
virtsnd_reset_fn(work)
mutex_lock(work->mutex)
if (!work->snd)
// do nothing and take an exit path
work->resetting = true;
mutex_unlock(work->mutex)
device_reprobe()
work->resetting = false;
interrupt_handler()
schedule_work(reset_works[snd->card->number].work);
What do you think?
I think it's still somehow racy. Suppose that the reset_work is
already running right before entering virtsnd_remove(): it sets
reset_works[].resetting flag, virtsnd_remove() skips canceling, and
both reset work and virtsnd_remove() perform at the very same time.
(I don't know whether this may happen, but I assume it's possible.)
In that case, maybe a better check is to check current_work(), and
perform cancel_work_sync() unless it's &reset_works[].work itself.
Then the recursive cancel call can be avoided.
After that point, the reset must be completed, and we can (again)
process the rest release procedure. (But also snd object itself might
have been changed again, so it needs to be re-evaluated.)
One remaining concern is that the card number of the sound instance
may change after reprobe. That is, we may want to another persistent
object instead of accessing via an array index of sound card number.
So, we might need reset_works[] associated with virtio_snd object
instead.
In anyway, this is damn complex. I sincerely hope that we can avoid
this kind of things. Wouldn't it be better to shift the reset stuff
up to the virtio core layer? Or drop the feature in the first
version. Shooting itself (and revival) is a dangerous magic spell,
after all.
thanks,
Takashi