Re: [PATCH v1 14/15] scsi: ufs: commit descriptors before setting the doorbell

From: Rob Herring
Date: Thu Aug 27 2015 - 13:20:15 EST


On Thu, Aug 27, 2015 at 7:11 AM, <ygardi@xxxxxxxxxxxxxx> wrote:
>> On Tue, Aug 25, 2015 at 7:36 AM, <ygardi@xxxxxxxxxxxxxx> wrote:
>>>> On Aug 21, 2015 3:10 PM, "Yaniv Gardi" <ygardi@xxxxxxxxxxxxxx> wrote:
>>>>>
>>>>> Add a write memory barrier to make sure descriptors prepared are
>>>>> actually
>>>>> written to memory before ringing the doorbell. We have also added the
>>>>> write memory barrier after ringing the doorbell register so that
>>>>> controller sees the new request immediately.
>>>>>
>>>>> Signed-off-by: Yaniv Gardi <ygardi@xxxxxxxxxxxxxx>
>>>>>
>>>>> ---
>>>>> drivers/scsi/ufs/ufshcd.c | 6 ++++++
>>>>> 1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>>>>> index fef0660..876148b 100644
>>>>> --- a/drivers/scsi/ufs/ufshcd.c
>>>>> +++ b/drivers/scsi/ufs/ufshcd.c
>>>>> @@ -833,6 +833,8 @@ void ufshcd_send_command(struct ufs_hba *hba,
>>>>> unsigned int task_tag)
>>>>> ufshcd_clk_scaling_start_busy(hba);
>>>>> __set_bit(task_tag, &hba->outstanding_reqs);
>>>>> ufshcd_writel(hba, 1 << task_tag,
>>>>> REG_UTP_TRANSFER_REQ_DOOR_BELL);
>>>>> + /* Make sure that doorbell is committed immediately */
>>>>> + wmb();
>>>>
>>>> Is this really necessary? Is there a measurable difference?
>>>
>>> I'm not sure if there is a measurable difference, but as the Door-Bell
>>> register is the one that actually responsible for the HW execution of
>>> the
>>> requests, anyhow, it's recommended to its value will be written
>>> instantly to the memory.
>>
>> A barrier doesn't guarantee speed, only ordering. Unless you can
>> measure the difference, you should not have it.
>
> Rob,
> let me have an example:
> context#1 updates outstanding_reqs variable and write(DOOR_BELL)
> context#2 upon interrupt of a request completion the following happens:
> report completion on each one of the bits in:
> outstanding_reqs ^ read(DOOR_BELL);
>
> 0. let's assume the DOOR_BELL = 0x1 (which means 1 active request in slot 0)
> 1. context#1: update the DOOR_BELL to be 0x3; (2 active requests: in slot
> 0 and 1)
> 2. the new value 0x3 is still not written to the DR so DORR_BELL is still
> 0x1, but outstanding_reqs is already updated = 0x3
> 3. the request in slot 0 just completed, and interrupt happens, so
> DORR_BELL is now 0 (request in slot 0 completed)
> 4. context#2: outstanding_reqs ^ read(DOOR_BELL) = 0x3 ^ 0x0 = 0x3 =>
> wrong conclusion since the request in slot 1 never completed, and actually
> never started.

Barriers alone will never solve this problem. They may narrow the
window possibly, but the problem is still there. What you have to have
is a spinlock around all accesses to both outstanding_reqs and
doorbell register. And guess what, spinlocks have appropriate barriers
to ensure visibility of what they protect. Or perhaps the h/w provides
another way to signal what slots have completed. Using the same
register for doorbell and completion status is not ideal.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/