Re: Bug in disk event polling

From: Rafael J. Wysocki
Date: Sun Feb 12 2012 - 16:22:29 EST


On Sunday, February 12, 2012, Alan Stern wrote:
> On Fri, 10 Feb 2012, Tejun Heo wrote:
>
> > Hello,
> >
> > On Fri, Feb 10, 2012 at 04:44:48PM -0500, Alan Stern wrote:
> > > > I think it should be nrt. It assumes that no one else is running it
> > > > concurrently; otherwise, multiple CPUs could jump into
> > > > disk->fops->check_events() concurrently which can be pretty ugly.
> > >
> > > Come to mention it, how can a single work item ever run on more than
> > > one CPU concurrently? Are you concerned about cases where some other
> > > thread requeues the work item while it is executing?
> >
> > Yeah, there are multiple paths which may queue the work item. For
> > polling work, it definitely was possible but maybe locking changes
> > afterwards removed that. Even then, it would be better to use nrt wq
> > as bug caused that way would be very difficult to track down.
>
> Okay, I'll create a new workqueue for this purpose.
>
>
> > > The problem is that these async threads generally aren't freezable.
> > > They will continue to run and do I/O while a system goes through a
> > > sleep transition. How should this be handled?
> >
> > I think it would be better to use wq for most kthreads. A lot of them
> > aren't strictly correct in the way they deal with
> > kthread_should_stop() and freezing. kthread in general simply seems
> > way too difficult to use correctly.
>
> Maybe so, but getting rid of it at this point would be a big change.
> Also, kthreads were originally considered more suitable for tasks that
> would need to run for a long time; is this no longer true?
>
> > > kthread_run() can be adjusted on a case-by-case basis, by inserting
> > > calls to set_freezable() and try_to_freeze() at the appropriate places.
> > > But what about async_schedule()?
> >
> > Given the stuff async is used for, maybe just make all async execution
> > freezable?
>
> That probably won't work. What if a driver relies on async thread
> execution to carry out its I/O?

Well, we use async in the suspend code itself. :-)

> As another example, sd_probe() calls async_schedule(sd_probe_async,...)
> to handle the long-running parts of probing a SCSI disk. In turn,
> sd_remove() calls async_synchronize_full() to insure that probing is
> over before the device is unbound from sd.
>
> What happens if a hot-unpluggable disk drive is unplugged while the
> system is asleep? It's entirely possible that the bus subsystem's
> resume routine would see the device was gone and would try to
> unregister it. Then sd_remove would wait for the async thread
> to finish, which would never happen because the thread would be frozen
> and wouldn't be thawed until all the resume routines had finished.
>
> In this case, the proper solution is to have the SCSI prepare method
> call async_synchronize_full(). Other cases will require other
> solutions.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/