Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

From: Mathias Nyman
Date: Mon Dec 19 2016 - 05:33:15 EST


On 13.12.2016 05:21, Baolin Wang wrote:
Hi Mathias,

On 12 December 2016 at 23:52, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
On 05.12.2016 09:51, Baolin Wang wrote:

If a command event is found on the event ring during an interrupt,
we need to stop the command timer with del_timer(). Since del_timer()
can fail if the timer is running and waiting on the xHCI lock, then
it maybe get the wrong timeout command in xhci_handle_command_timeout()
if host fetched a new command and updated the xhci->current_cmd in
handle_cmd_completion(). For this situation, we need a way to signal
to the command timer that everything is fine and it should exit.


Ah, right, this could actually happen.



We should introduce a counter (xhci->current_cmd_pending) for the number
of pending commands. If we need to cancel the command timer and
del_timer()
succeeds, we decrement the number of pending commands. If del_timer()
fails,
we leave the number of pending commands alone.

For handling timeout command, in xhci_handle_command_timeout() we will
check
the counter after decrementing it, if the counter
(xhci->current_cmd_pending)
is 0, which means xhci->current_cmd is the right timeout command. If the
counter (xhci->current_cmd_pending) is greater than 0, which means current
timeout command has been handled by host and host has fetched new command
as
xhci->current_cmd, then just return and wait for new current command.


A counter like this could work.

Writing the abort bit can generate either ABORT+STOP, or just STOP
event, this seems to cover both.

quick check, case 1: timeout and cmd completion at the same time.

cpu1 cpu2

queue_command(first), p++ (=1)
queue_command(more),
--completion irq fires-- -- timer times out at same time--
handle_cmd_completion() handle_cmd_timeout(),)
lock(xhci_lock ) spin_on(xhci_lock)
del_timer() fail, p (=1, nochange)
cur_cmd = list_next(), p++ (=2)
unlock(xhci_lock)
lock(xhci_lock)
p-- (=1)
if (p > 0), exit
OK works

case 2: normal timeout case with ABORT+STOP, no race.

cpu1 cpu2

queue_command(first), p++ (=1)
queue_command(more),
handle_cmd_timeout()
p-- (P=0), don't exit
mod_timer(), p++ (P=1)
write_abort_bit()
handle_cmd_comletion(ABORT)
del_timer(), ok, p-- (p = 0)
handle_cmd_completion(STOP)
del_timer(), fail, (P=0)
handle_stopped_cmd_ring()
cur_cmd = list_next(), p++ (=1)
mod_timer()

OK, works, and same for just STOP case, with the only difference that
during handle_cmd_completion(STOP) p is decremented (p--)

Yes, that's the cases what I want to handle, thanks for your explicit
explanation.


Gave this some more thought over the weekend, and this implementation
doesn't solve the case when the last command times out and races with the
completion handler:

cpu1 cpu2

queue_command(first), p++ (=1)
--completion irq fires-- -- timer times out at same time--
handle_cmd_completion() handle_cmd_timeout(),)
lock(xhci_lock ) spin_on(xhci_lock)
del_timer() fail, p (=1, nochange)
no more commands, P (=1, nochange)
unlock(xhci_lock)
lock(xhci_lock)
p-- (=0)
p == 0, continue, even if we should not.
For this we still need to rely on checking cur_cmd == NULL in the timeout function.
(Baolus patch sets it to NULL if there are no more commands pending)

And then we could replace the whole counter with a simple check if the timeout timer
is pending in the timeout function:

xhci_handle_command_timeout()
lock()
if (!cur_cmd || timer_pending(timeout_timer)) {
unlock();
return;
}

-Mathias