Re: [PATCH net-next v2 3/7] net: ipa: verify reference flag values

From: Alex Elder
Date: Sun Sep 13 2020 - 09:39:14 EST


On 9/12/20 9:25 PM, Bjorn Andersson wrote:
> On Fri 11 Sep 19:45 CDT 2020, Alex Elder wrote:
>
>> We take a single IPA clock reference to keep the clock running until
>> we get a system suspend operation, and maintain a flag indicating
>> whether that reference has been taken. When a suspend request
>> arrives, we drop that reference and clear the flag.
>>
>> In most places we simply set or clear the extra-reference flag.
>> Instead--primarily to catch coding errors--test the previous value
>> of the flag and report an error in the event the previous value is
>> unexpected. And if the clock reference is already taken, don't take
>> another.
>>
>> In a couple of cases it's pretty clear atomic access is not
>> necessary and an error should never be reported. Report these
>> anyway, conveying our surprise with an added exclamation point.
>>
>> Signed-off-by: Alex Elder <elder@xxxxxxxxxx>
>> ---
>> v2: Updated to operate on a bitmap bit rather than an atomic_t.
>>
>> drivers/net/ipa/ipa_main.c | 23 ++++++++++++++++-------
>> 1 file changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
>> index 409375b96eb8f..cfdf60ded86ca 100644
>> --- a/drivers/net/ipa/ipa_main.c
>> +++ b/drivers/net/ipa/ipa_main.c
>> @@ -83,6 +83,7 @@ static void ipa_suspend_handler(struct ipa *ipa, enum ipa_irq_id irq_id)
>> /* Take a a single clock reference to prevent suspend. All
>> * endpoints will be resumed as a result. This reference will
>> * be dropped when we get a power management suspend request.
>> + * The first call activates the clock; ignore any others.
>> */
>> if (!test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
>> ipa_clock_get(ipa);
>> @@ -502,14 +503,17 @@ static void ipa_resource_deconfig(struct ipa *ipa)
>> */
>> static int ipa_config(struct ipa *ipa, const struct ipa_data *data)
>> {
>> + struct device *dev = &ipa->pdev->dev;
>> int ret;
>>
>> /* Get a clock reference to allow initialization. This reference
>> * is held after initialization completes, and won't get dropped
>> * unless/until a system suspend request arrives.
>> */
>> - __set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
>> - ipa_clock_get(ipa);
>> + if (!__test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
>> + ipa_clock_get(ipa);
>> + else
>> + dev_err(dev, "suspend clock reference already taken!\n");
>>
>> ipa_hardware_config(ipa);
>>
>> @@ -544,7 +548,8 @@ static int ipa_config(struct ipa *ipa, const struct ipa_data *data)
>> err_hardware_deconfig:
>> ipa_hardware_deconfig(ipa);
>> ipa_clock_put(ipa);
>> - __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
>> + if (!__test_and_clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
>> + dev_err(dev, "suspend clock reference already dropped!\n");
>>
>> return ret;
>> }
>> @@ -562,7 +567,8 @@ static void ipa_deconfig(struct ipa *ipa)
>> ipa_endpoint_deconfig(ipa);
>> ipa_hardware_deconfig(ipa);
>> ipa_clock_put(ipa);
>> - __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
>> + if (!test_and_clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
>
> Doesn't this imply that we ran with the clocks disabled, which
> presumably would have nasty side effects?

Yes. This is one of those that I mentioned "can't happen"
but I added the check anyway.

We call ipa_config() as the last step of ipa_probe(). The inverse
of ipa_config() is ipa_deconfig(), and that is called in two cases:
- If the AP is loading firmware, it does so *after* ipa_config()
has been called and returned success. If firmware loading fails,
ipa_deconfig() is called in the error path to clean up. If we
never reached ipa_config() in the probe function, we will never
call ipa_deconfig() in the error path.
- If ipa_config() fails when called in ipa_probe(), it will clean
up all changed state and return an error value. I *assume* that
if the ->probe function returns an error, the ->remove function
will never be called. So again, we will never call ipa_deconfig()
unless ipa_config() has been called.

That's the reasoning anyway. That being said, you make a very
good point, in that the whole purpose of checking this at all
is to catch coding errors, and a WARN() call would provide much
better information than just an error message would.

So I will plan to update this in a new version of this patch
(and series). I'll wait until tonight or tomorrow to see if
there is any other feedback before preparing that.

Thanks a lot.

-Alex

> This seems like something that is worthy of more than just a simple
> printout - which no one will actually read. If you instead use a
> WARN_ON() to highlight this at least some of the test environments out
> there will pick it up and report it...
>
> Regards,
> Bjorn
>
>> + dev_err(&ipa->pdev->dev, "no suspend clock reference\n");
>> }
>>
>> static int ipa_firmware_load(struct device *dev)
>> @@ -913,7 +919,8 @@ static int ipa_suspend(struct device *dev)
>> struct ipa *ipa = dev_get_drvdata(dev);
>>
>> ipa_clock_put(ipa);
>> - __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
>> + if (!test_and_clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
>> + dev_err(dev, "suspend: missing suspend clock reference\n");
>>
>> return 0;
>> }
>> @@ -933,8 +940,10 @@ static int ipa_resume(struct device *dev)
>> /* This clock reference will keep the IPA out of suspend
>> * until we get a power management suspend request.
>> */
>> - __set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
>> - ipa_clock_get(ipa);
>> + if (!test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
>> + ipa_clock_get(ipa);
>> + else
>> + dev_err(dev, "resume: duplicate suspend clock reference\n");
>>
>> return 0;
>> }
>> --
>> 2.20.1
>>