Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()

From: Damien Le Moal
Date: Mon Oct 31 2022 - 01:59:24 EST


On 10/28/22 17:33, John Garry wrote:
> On 28/10/2022 09:07, Damien Le Moal wrote:
>>> Well, yeah. So if some error happens and EH kicks in, then full queue
>>> depth of requests may be allocated. I have seen this for NCQ error. So
>>> this is why I make in very first patch change allow us to allocate
>>> reserved request from sdev request queue even when budget is fully
>>> allocated.
>>>
>>> Please also note that for AHCI, I make reserved depth =1, while for SAS
>>> controllers it is greater. This means that in theory we could alloc > 1x
>>> reserved command for SATA disk, but I don't think it matters.
>> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that
>> the
>> user can only use 31 tags ? or is it 32+1 reserved ? which we can do
>> since
>> when using the reserved request, we will not use a hw tag (all reserved
>> requests will be non-ncq).
>>
>> The 32 + 1 scheme will work.
>
> Yes, 32 + 1 is what we want. I now think that there is a mistake in my
> code in this series for ahci.
>
> So I think we need for ahci:
>
> can_queue = 33

Hmm.. For ATA, can_queue should be only 32. nr_tags is going to be 33
though as we include one tag for a reserved command. No ? (may be I
misunderstand can_queue though).

> nr_reserved_cmds = 1
>
> while I only have can_queue = 32

Which seems right to me.

>
> I need to check that again for ahci driver and AHCI SHT...
>
>> But for CDL command completion handling, we
>> will need a NCQ command to do a read log, to avoid forcing a queue drain.
>> For that to reliably work, we'll need a 31+1+1 setup...
>>
>
> So is your idea to permanently reserve 1 more command from 32 commands ?

Yes. Command Duration Limits has this weird case were commands may be
failed when exceeding their duration limit with a "good status" and
"sense data available bit" set. This case was defined to avoid the queue
stall that happens with any NCQ error. The way to handle this without
relying on EH (as that would also cause a queue drain) is to issue an
NCQ read log command to fetch the "sense data for successful NCQ
commands" log, to retrieve the sense data for the completed command and
check if it really failed or not. So we absolutely need a reserved
command for this, Without a reserved command, it is a nightmare to
support (tag reuse would be another solution, but it is horrible).

> Or re-use 1 from 32 (and still also have 1 separate internal command)?

I am not yet 100% sure if we can treat that internal NCQ read log like
any other read/write request... If we can, then the 1-out-of-32
reservation would not be needed. Need to revisit all the cases we need
to take care of (because in the middle of this CDL completion handling,
regular NCQ errors can happen, resulting in a drive reset that could
wreck everything as we lose the sense data for the completed requests).

In any case, I think that we can deal with that extra reserved command
on top of you current series. No need to worry about it for now I think.

>
> Thanks,
> John

--
Damien Le Moal
Western Digital Research