Re: [PATCH 09/18] soc: qcom: ipa: GSI transactions

From: Alex Elder
Date: Sun May 19 2019 - 13:13:53 EST


On 5/17/19 1:44 PM, Alex Elder wrote:
> On 5/17/19 1:33 PM, Arnd Bergmann wrote:
>> On Fri, May 17, 2019 at 8:08 PM Alex Elder <elder@xxxxxxxxxx>
>> wrote:
>>>
>>> On 5/15/19 2:34 AM, Arnd Bergmann wrote:
>>>>> +static void gsi_trans_tre_fill(struct gsi_tre *dest_tre,
>>>>> dma_addr_t addr, + u32 len, bool
>>>>> last_tre, bool bei, + enum
>>>>> ipa_cmd_opcode opcode) +{ + struct gsi_tre tre; + +
>>>>> tre.addr = cpu_to_le64(addr); + tre.len_opcode =
>>>>> gsi_tre_len_opcode(opcode, len); + tre.reserved = 0; +
>>>>> tre.flags = gsi_tre_flags(last_tre, bei, opcode); + +
>>>>> *dest_tre = tre; /* Write TRE as a single (16-byte)
>>>>> unit */ +}
>>>> Have you checked that the atomic write is actually what happens
>>>> here, but looking at the compiler output? You might need to add
>>>> a 'volatile' qualifier to the dest_tre argument so the
>>>> temporary structure doesn't get optimized away here.
>>>
>>> Currently, the assignment *does* become a "stp" instruction. But
>>> I don't know that we can *force* the compiler to write it as a
>>> pair of registers, so I'll soften the comment with "Attempt to
>>> write" or something similar.
>>>
>>> To my knowledge, adding a volatile qualifier only prevents the
>>> compiler from performing funny optimizations, but that has no
>>> effect on whether the 128-bit assignment is made as a single
>>> unit. Do you know otherwise?
>>
>> I don't think it you can force the 128-bit assignment to be atomic,
>> but marking 'dest_tre' should serve to prevent a specific
>> optimization that replaces the function with
>>
>> dest_tre->addr = ... dest_tre->len_opcode = ... dest_tre->reserved
>> = ... dest_tre->flags = ...
>>
>> which it might find more efficient than the stp and is equivalent
>> when the pointer is not marked volatile. We also have the
>> WRITE_ONCE() macro that can help prevent this, but it does not work
>> reliably beyond 64 bit assignments.
>
> OK, I'll mark it volatile to avoid that potential result.

OK I got interesting results so I wanted to report back.

The way it is currently written (no volatile qualifier) is
the *only* way I get a 16-byte store instruction.

Specifically, with dest_tre having type (struct gsi_tre *):
*dest_tre = tre;

Produces this:
4ec: a9002013 stp x19, x8, [x0]

I attempted to make the assigned-to pointer volatile:
*(volatile struct gsi_tre *)dest_tre = tre;

But that apparently is interpreted as "assign things
in the destination in exactly the way they were assigned
above into the "tre" structure." Because it produced this:
748: f9000348 str x8, [x26]
74c: 7940c3e8 ldrh w8, [sp, #96]
750: 79001348 strh w8, [x26, #8]
754: 7940bbe8 ldrh w8, [sp, #92]
758: 79001748 strh w8, [x26, #10]
75c: b9405be8 ldr w8, [sp, #88]
760: b9000f48 str w8, [x26, #12]

>From there I went back and changed the type of the dest_tre pointer
parameter to (volatile struct gsi_tre *), and changed the the type
of the values passed through that argument in the two callers to
also have the volatile qualifier. This way there was no need to
use a cast in the left-hand side. That too produced a string of
separate assignments, not a single 128-bit one.

I also attempted this:
*dest_tre = *(volatile struct gsi_tre *)&tre;
And even this:
*(volatile struct gsi_tre *)dest_tre = *(volatile struct gsi_tre *)&tre;
But neither produced a single "stp" instruction; all produced
the same sequence of instructions above.

So it seems that I must *not* apply a volatile qualifier,
because doing so restricts the compiler from making the
single instruction optimization.

If I've missed something and you have another suggestion for
me to try let me know and I'll try it.

-Alex


> Thanks.
>
> -Alex
>
>>
>> Arnd
>>
>