Re: [PATCH] Fix race in get_request()
From: JÃrn Engel
Date: Fri Aug 08 2014 - 13:47:04 EST
On Fri, 8 August 2014 08:28:47 -0600, Jens Axboe wrote:
> On 08/08/2014 08:24 AM, Jens Axboe wrote:
> > On 08/07/2014 06:54 PM, JÃrn Engel wrote:
> >>
> >> If __get_request() returns NULL, get_request will call
> >> prepare_to_wait_exclusive() followed by io_schedule(). Not rechecking
> >> the sleep condition after prepare_to_wait_exclusive() leaves a race
> >> where the condition changes before prepare_to_wait_exclusive(), but
> >> not after and accordingly this thread never gets woken up.
> >>
> >> The race must be exceedingly hard to hit, otherwise I cannot explain how
> >> such a classic race could outlive the last millenium.
> >
> > I think that is a genuine bug, it's just extremely hard to hit in real
> > life. It has probably only potentially ever triggered in the cases where
> > we are so out of memory that a blocking ~300b alloc fails, and Linux
> > generally shits itself pretty hard when it gets to that stage anyway...
> > And for the bug to be critical, you'd need this to happen for a device
> > that otherwise has no IO pending, since you'd get woken up by the next
> > completed request anyway.
>
> Actually, this can't trigger for an empty queue, since the mempool holds
> a few requests. So it should never result in a softlock, we will make
> progress. Given that we also still hold the queue spinlock (that will be
> held for a free as well), we should not be able to get a free of a
> request until the prepare_to_wait() has been done. So not sure there is
> an actual bug there, but I agree the code looks confusing that way.
The race I was afraid of is not on an empty queue, but where all IO
gets completed between the __get_request() and
prepare_to_wait_exclusive(). That may well be hundreds of requests,
so under normal circumstances they should take far longer than the
race window to complete. So something has to enlarge the window.
With the spin_lock_irq held, that removes interrupts and a preemptible
kernel from the candidate list. Now we would need hardware support,
either some kind of unfair SMT cpus or a hypervisor that doesn't
schedule the victim cpu's thread for a long time and at just the right
time.
Maybe the hypervisor case is at least theoretically possible. But I
would also be happy with leaving the code as-is and at most adding a
comment for the next person.
JÃrn
--
The grand essentials of happiness are: something to do, something to
love, and something to hope for.
-- Allan K. Chalmers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/