Thanks for your comment, see the explaination inline.
We'll apply your advice in later patch.
Please don't duplicate this code in the driver, this is part of libata core in libata-scsi.c. Add an export for these functions if you need to use them in the driver.
[kuan]: These calls are declared in static type . I can't export them
and don't want to
modify libata code.
I'm still puzzling out how this stuff all works, but it looks like this code makes you stop sending new commands if:
-the port is in the FPDMA Data Phase (DMA Setup FIS received but the transfer is not complete yet) - I assume the hardware doesn't handle this itself, which seems rather unique
-we previously deferred a command inside of qc_issue because we were in the FPDMA data phase
-we previously saw dhfis_flags not equal to qc_active, or we got a BACKOUT interrupt (whatever exactly that means), both of which set some value in the back_byte
[kuan]: -If we got BACKOUT interrupt, it means that a command just sent by
driver
backed out.The driver should resend the command.So new commands should
be defered.
-If dhfis_flags != qc_active, it indicates that the last command doesn't
generate a device to host register FIS .
After sending some commands, I found that the last command sometimes has
this problem but previous commands are normal.In this case, we need resend the last
command.
Both cases set back_byte.
This code seems a bit odd. Isn't this tossing out a bunch of potential error status, etc?
[kuan]:
If there are commands in queue, the driver can send a new command only after receiving dhfis intr of previous command and before receiving
any dmasetup fis intr.
In this place, i do the last check before sending the command.
+ done_mask = pp->qc_active ^ sactive;
+ if (unlikely(done_mask & sactive)) {
+ ata_port_printk(ap, KERN_ERR, "illegal qc_active
transition "
+ "(%08x->%08x)\n", ap->qc_active,
sactive);
+ return -EINVAL;
+ }
Shouldn't this trigger error handling if it happens instead of just printing an error?
[kuan]:
I think the error handling can be triggered by timeout. In fact, this case should seldom happen.
Additional/general comments:
Think you need some code to handle suspend and resume (re-enable SATA MMIO space, etc.)