Re: [RFC: PATCH 2/2] iio: adc: exynos_adc: Handle timeout and raceconditions

From: Lars-Peter Clausen
Date: Fri Apr 05 2013 - 03:54:20 EST


On 04/03/2013 07:06 PM, Doug Anderson wrote:
> Lars,
>
> On Sat, Mar 16, 2013 at 7:41 AM, Lars-Peter Clausen <lars@xxxxxxxxxx> wrote:
>> I think you still need the mutex for serialization, otherwise the requests
>> would just cancel each other out. Btw. what happens if you start a conversion
>> while another is still in progress? Is it possible to abort a conversion?
>
> I was thinking that the spinlock would just replace the mutex for the
> purposes of serialization.
>

Since we sleep inside the protected section we need to use a mutex.

> I stepped back a bit, though, and I'm wondering if we're over-thinking
> things. The timeout case should certainly be handled properly (thanks
> for pointing it out), but getting a timeout is really not expected and
> adding a lot of extra overhead to handle it elegantly seems a bit
> much?
>
> Specifically, the mutex means that we have one user of the ADC at a
> time, and ADC conversion has nothing variable about it. The user
> manual that I have access to talks about 12-bit conversion happening
> in 1 microsecond with a 5MHz input clock or 5 microseconds with a 1MHz
> input clock. Even if someone has clocks configured very differently,
> it would be hard to imagine a conversion actually taking a full
> second.
>
> ...so that means that if the timeout actually fires then something
> else fairly drastic has gone wrong. It's _very_ unlikely that the IRQ
> will still go off for this conversion sometime in the future.
>

It's not the timeout case I'm worried about, but the case where the transfer
is interrupted by the user. Even though it is rather unlikely for the
problem to occur we should still try to avoid it, this is one of these
annoying heisenbugs that happen once in a while and nobody is able to
reproduce them.

> To me, total modifications to what's landed already ought to be:
>
> * Change timeout to long (from unsigned long)
>
> * Make sure we return errors (negative results) from
> wait_for_completion_interruptible_timeout() properly.
>
> * If we get back a value of 0 from
> wait_for_completion_interruptible_timeout() then we should print a
> warning and attempt machinations to reset the ADC. Without ever
> seeing real-world situtations that would cause a real timeout these
> machinations would be a bit of a guess (is resetting the adc useful
> when it's more likely that someone accidentally messed with the clock
> tree or power gated the ADC?)... ...or perhaps a warning and a TODO
> in the code would be enough?
>
>
> Thoughts?

I think most of this is already implemented and Naveen sent a patch to reset
the controller in case of a timeout, which is a good idea and works fine,
but you still should handle the case where the user aborted the transfer.
Just resetting the core should work as well in that case.

- Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/