-----Original Message-----
From: Mathias Nyman [mailto:mathias.nyman@xxxxxxxxx]
Sent: Monday, March 21, 2016 2:46 PM
To: Rajesh Bhagat <rajesh.bhagat@xxxxxxx>; Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx>; linux-usb@xxxxxxxxxxxxxxx; linux-
kernel@xxxxxxxxxxxxxxx
Cc: gregkh@xxxxxxxxxxxxxxxxxxx; Sriram Dash <sriram.dash@xxxxxxx>
Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
commmand timeout
On 21.03.2016 06:18, Rajesh Bhagat wrote:
Hi
I think clearing the whole command ring is a bit too much in this case.
It may cause issues for all attached devices when one command times out.
Hi Mathias,
I understand your point, But I want to understand how would completion
handler be called if a command is timed out and xhci_abort_cmd_ring is
successful. In this case all the code would be waiting on completion handler forever.
2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) ->
xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd
In our case command is timed out, Hence we hit the case #2 but
xhci_abort_cmd_ring is success which does not calls complete.
xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register.
This will generate a command completion event with status "command aborted" for
the pending command.
This event is then followed by a "command ring stopped" command completion event.
See xHCI specs 5.4.5 and 4.6.1.2
handle_cmd_completion() will check if cmd_comp_code == COMP_CMD_ABORT, goto
event_handled, and call xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for
the aborted command.
If xHCI already processed the aborted command, we might only get a command ring
stopped event, in this case handle_cmd_completion() will call
xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the commands that were
tagged for "abort" that still remain on the command ring to NO-OP commands.
The completion callback will be called for these NO-OP command later when we get a
command completion event for them.
Thanks Mathias for detailed explanation. Now I understand how completion handler is
supposed to be called in this scenario.
But in our case, somehow we are not getting any event and handle_cmd_completion function
is not getting called even after successful xhci_abort_cmd_ring when command timed out.
Now, my point here is code prior to this patch xhci: rework command timeout and cancellation,
Code would have returned in case command timed out in xhci_alloc_dev itself.
- /* XXX: how much time for xHC slot assignment? */
- timeleft = wait_for_completion_interruptible_timeout(
- command->completion,
- XHCI_CMD_DEFAULT_TIMEOUT);
- if (timeleft <= 0) {
- xhci_warn(xhci, "%s while waiting for a slot\n",
- timeleft == 0 ? "Timeout" : "Signal");
- /* cancel the enable slot request */
- ret = xhci_cancel_cmd(xhci, NULL, command->command_trb);
- return ret;
- }
+ wait_for_completion(command->completion);
But after this patch, we are waiting for hardware event, which is somehow not generated
and causing a hang scenario.
IMO, The assumption that "xhci_abort_cmd_ring would always generate an event
and handle_cmd_completion would be called" will not be always be true if HW is in bad state.
Please share your opinion.