Re: A potential data race in drivers/iio/adc/berlin2-adc.ko

From: Jonathan Cameron
Date: Sat Mar 20 2021 - 10:54:53 EST


On Thu, 18 Mar 2021 09:47:29 +0100
Lars-Peter Clausen <lars@xxxxxxxxxx> wrote:

> On 3/18/21 9:27 AM, Lars-Peter Clausen wrote:
> > On 3/18/21 9:07 AM, Pavel Andrianov wrote:
> >> Hi,
> >>
> >> berlin2_adc_probe [1] registers two interrupt handlers:
> >> berlin2_adc_irq [2]
> >> and berlin2_adc_tsen_irq [3]. The interrupt handlers operate with the
> >> same data, for example, modify
> >> priv->data with different masks:
> >>
> >> priv->data &= BERLIN2_SM_ADC_MASK;
> >> and
> >> priv->data &= BERLIN2_SM_TSEN_MASK;
> >>
> >> If the two interrupt handlers are executed simultaneously, a
> >> potential data race takes place. So, the question is if the situation
> >> is possible. For example, in the case of the handlers are executed on
> >> different CPU cores.

If we assume there is a race here, the reading into priv->data from
two different registers on the line above the masking is more of any issue.

> >>
> >> Best regards,
> >> Pavel
> >>
> >> [1]
> >> https://elixir.bootlin.com/linux/latest/source/drivers/iio/adc/berlin2-adc.c#L283
> >>
> >> [2]
> >> https://elixir.bootlin.com/linux/latest/source/drivers/iio/adc/berlin2-adc.c#L239
> >>
> >> [3]
> >> https://elixir.bootlin.com/linux/latest/source/drivers/iio/adc/berlin2-adc.c#L259
> >>
> > Looking at the code there are two functions. berlin2_adc_tsen_read()
> > and berlin2_adc_read(). These two function are take the same mutex and
> > can not run concurrently. At the beginning of the protected section
> > the corresponding interrupt for that function is enabled and at the
> > end disabled. So at least if the hardware works correctly those two
> > interrupts will never fire at the same time.
> >
> > Now, if the hardware misbehaves the two interrupts could still fire at
> > the same time.
> >
> > - Lars
> >
> Actually thinking a bit more about this the interrupt could still fire
> after it has been disabled since there is no synchronization between the
> disable and the interrupt handler. And the handler might be queued on
> one CPU, while the disable is running on another CPU.
>
Superficially it looks like splitting the priv->data and related priv->data_available
into versions for the normal ADC and the touch screen ADC paths should solve
this at the trivial cost of a couple of elements in that structure.
Possibly also need to deal with the wait_queue but I think that's fine as is.
(haven't thought about it that much!)

Jonathan