Re: [PATCH i2c-next v6] i2c: aspeed: Handle master/slave combined irq events properly

From: CÃdric Le Goater
Date: Thu Sep 13 2018 - 04:17:05 EST


On 09/12/2018 06:54 PM, Jae Hyun Yoo wrote:
> On 9/11/2018 6:34 PM, Guenter Roeck wrote:
>> On Tue, Sep 11, 2018 at 04:58:44PM -0700, Jae Hyun Yoo wrote:
>>> On 9/11/2018 4:33 PM, Guenter Roeck wrote:
>>>> Looking into the patch, clearing the interrupt status at the end of an
>>>> interrupt handler is always suspicious and tends to result in race
>>>> conditions (because additional interrupts may have arrived while handling
>>>> the existing interrupts, or because interrupt handling itself may trigger
>>>> another interrupt). With that in mind, the following patch fixes the
>>>> problem for me.
>>>>
>>>> Guenter
>>>>
>>>> ---
>>>>
>>>> diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
>>>> index c258c4d9a4c0..c488e6950b7c 100644
>>>> --- a/drivers/i2c/busses/i2c-aspeed.c
>>>> +++ b/drivers/i2c/busses/i2c-aspeed.c
>>>> @@ -552,6 +552,8 @@ static irqreturn_t aspeed_i2c_bus_irq(int irq, void *dev_id)
>>>> ÂÂÂÂÂ spin_lock(&bus->lock);
>>>> ÂÂÂÂÂ irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
>>>> +ÂÂÂ /* Ack all interrupt bits. */
>>>> +ÂÂÂ writel(irq_received, bus->base + ASPEED_I2C_INTR_STS_REG);
>>>> ÂÂÂÂÂ irq_remaining = irq_received;
>>>> Â #if IS_ENABLED(CONFIG_I2C_SLAVE)
>>>> @@ -584,8 +586,6 @@ static irqreturn_t aspeed_i2c_bus_irq(int irq, void *dev_id)
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂ "irq handled != irq. expected 0x%08x, but was 0x%08x\n",
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂ irq_received, irq_handled);
>>>> -ÂÂÂ /* Ack all interrupt bits. */
>>>> -ÂÂÂ writel(irq_received, bus->base + ASPEED_I2C_INTR_STS_REG);
>>>> ÂÂÂÂÂ spin_unlock(&bus->lock);
>>>> ÂÂÂÂÂ return irq_remaining ? IRQ_NONE : IRQ_HANDLED;
>>>> Â }
>>>>
>>>
>>> My intention of putting the code at the end of interrupt handler was,
>>> to reduce possibility of combined irq calls which is explained in this
>>> patch. But YES, I agree with you. It could make a potential race
>>
>> Hmm, yes, but that doesn't explain why it would make sense to acknowledge
>> the interrupt late. The interrupt ack only means "I am going to handle these
>> interrupts". If additional interrupts arrive while the interrupt handler
>> is active, those will have to be acknowledged separately.
>>
>> Sure, there is a risk that an interrupt arrives while the handler is
>> running, and that it is handled but not acknowledged. That can happen
>> with pretty much all interrupt handlers, and there are mitigations to
>> limit the impact (for example, read the interrupt status register in
>> a loop until no more interrupts are pending). But acknowledging
>> an interrupt that was possibly not handled is always bad idea.
>
> Well, that's generally right but not always. Sometimes that depends on
> hardware and Aspeed I2C is the case.
>
> This is a description from Aspeed AST2500 datasheet:
> Â I2CD10 Interrupt Status Register
> Â bit 2 Receive Done Interrupt status
> ÂÂÂÂÂÂÂ S/W needs to clear this status bit to allow next data receiving.
>
> It means, driver should hold this bit to prevent transition of hardware
> state machine until the driver handles received data, so the bit should
> be cleared at the end of interrupt handler.
>
> Let me share my test result. Your code change works on 100KHz bus speed
> but doesn't work well on 1MHz bus speed. Investigated that interrupt
> handling is fast enough in 100KHz test but in 1MHz, most of data is
> corrupted because the bit is cleared at the beginning of interrupt
> handler so it allows receiving of the next data but the interrupt
> handler isn't fast enough to read the data buffer on time. I checked
> this problem on BMC-ME channel which ME sends lots of IPMB packets to
> BMC at 1MHz speed. You could simply check the data corruption problem on
> the BMC-ME channel.

OK.

> My thought is, the current code is right for real Aspeed I2C hardware.
> It seems that QEMU 3.0 model for witherspoon-bmc doesn't simulate the
> actual Aspeed I2C hardware correctly.

That might be very well possible yes. it also misses support for the slave
mode and the DMA registers.

Thanks for the info,

C.