Re: [PATCH 0/1] Fix race in ipmi timer cleanup

From: Jes Sorensen
Date: Wed Aug 28 2019 - 20:53:54 EST


On 8/28/19 6:32 PM, Corey Minyard wrote:
> On Wed, Aug 28, 2019 at 04:36:24PM -0400, Jes Sorensen wrote:
>> From: Jes Sorensen <jsorensen@xxxxxx>
>>
>> I came across this in 4.16, but I believe the bug is still present
>> in current 5.x, even if it is less likely to trigger.
>>
>> Basially stop_timer_and_thread() only calls del_timer_sync() if
>> timer_running == true. However smi_mod_timer enables the timer before
>> setting timer_running = true.
>
> All the modifications/checks for timer_running should be done under
> the si_lock. It looks like a lock is missing in shutdown_smi(),
> probably starting before setting interrupt_disabled to true and
> after stop_timer_and_thread. I think that is the right fix for
> this problem.

Hi Corey,

I agree a spin lock could deal with this specific issue too, but calling
del_timer_sync() is safe to call on an already disabled timer. The whole
flagging of timer_running really doesn't make much sense in the first
place either.

As for interrupt_disabled that is even worse. There's multiple places in
the code where interrupt_disabled is checked, some of them are not
protected by a spin lock, including shutdown_smi() where you have this
sequence:

while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
poll(smi_info);
schedule_timeout_uninterruptible(1);
}
if (smi_info->handlers)
disable_si_irq(smi_info);
while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
poll(smi_info);
schedule_timeout_uninterruptible(1);
}

In this case you'll have to drop and retake the long several times.

You also have this call sequence which leads to disable_si_irq() which
checks interrupt_disabled:

flush_messages()
smi_event_handler()
handle_transaction_done()
handle_flags()
alloc_msg_handle_irq()
disable_si_irq()

{disable,enable}_si_irq() themselves are racy:

static inline bool disable_si_irq(struct smi_info *smi_info)
{
if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
smi_info->interrupt_disabled = true;

Basically interrupt_disabled need to be atomic here to have any value,
unless you ensure to have a spin lock around every access to it.

Cheers,
Jes