Re: [PATCH] ipmi: Stop timers before cleaning up the module
From: Greg KH
Date: Thu Dec 14 2017 - 12:46:18 EST
On Thu, Dec 14, 2017 at 09:16:38AM -0600, minyard@xxxxxxx wrote:
> From: Masamitsu Yamazaki <m-yamazaki@xxxxxxxxxxxxx>
>
> commit 4f7f5551a760eb0124267be65763008169db7087 upstream.
>
> System may crash after unloading ipmi_si.ko module
> because a timer may remain and fire after the module cleaned up resources.
>
> cleanup_one_si() contains the following processing.
>
> /*
> * Make sure that interrupts, the timer and the thread are
> * stopped and will not run again.
> */
> if (to_clean->irq_cleanup)
> to_clean->irq_cleanup(to_clean);
> wait_for_timer_and_thread(to_clean);
>
> /*
> * Timeouts are stopped, now make sure the interrupts are off
> * in the BMC. Note that timers and CPU interrupts are off,
> * so no need for locks.
> */
> while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
> poll(to_clean);
> schedule_timeout_uninterruptible(1);
> }
>
> si_state changes as following in the while loop calling poll(to_clean).
>
> SI_GETTING_MESSAGES
> => SI_CHECKING_ENABLES
> => SI_SETTING_ENABLES
> => SI_GETTING_EVENTS
> => SI_NORMAL
>
> As written in the code comments above,
> timers are expected to stop before the polling loop and not to run again.
> But the timer is set again in the following process
> when si_state becomes SI_SETTING_ENABLES.
>
> => poll
> => smi_event_handler
> => handle_transaction_done
> // smi_info->si_state == SI_SETTING_ENABLES
> => start_getting_events
> => start_new_msg
> => smi_mod_timer
> => mod_timer
>
> As a result, before the timer set in start_new_msg() expires,
> the polling loop may see si_state becoming SI_NORMAL
> and the module clean-up finishes.
>
> For example, hard LOCKUP and panic occurred as following.
> smi_timeout was called after smi_event_handler,
> kcs_event and hangs at port_inb()
> trying to access I/O port after release.
>
> [exception RIP: port_inb+19]
> RIP: ffffffffc0473053 RSP: ffff88069fdc3d80 RFLAGS: 00000006
> RAX: ffff8806800f8e00 RBX: ffff880682bd9400 RCX: 0000000000000000
> RDX: 0000000000000ca3 RSI: 0000000000000ca3 RDI: ffff8806800f8e40
> RBP: ffff88069fdc3d80 R8: ffffffff81d86dfc R9: ffffffff81e36426
> R10: 00000000000509f0 R11: 0000000000100000 R12: 0000000000]:000000
> R13: 0000000000000000 R14: 0000000000000246 R15: ffff8806800f8e00
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
> --- <NMI exception stack> ---
>
> To fix the problem I defined a flag, timer_can_start,
> as member of struct smi_info.
> The flag is enabled immediately after initializing the timer
> and disabled immediately before waiting for timer deletion.
>
> Fixes: 0cfec916e86d ("ipmi: Start the timer and thread on internal msgs")
> Signed-off-by: Yamazaki Masamitsu <m-yamazaki@xxxxxxxxxxxxx>
> [Some fairly major changes went into the IPMI driver in 4.15, so this
> required a backport as the code had changed and moved to a different
> file. The 4.14 version of this patch moved some code under an
> if statement causing it to not apply to 4.7-4.13.]
> Signed-off-by: Corey Minyard <cminyard@xxxxxxxxxx>
> ---
> This is for kernel versions 4.7-4.13 only. Code and API changes required
> backporting. There is another version for 4.14 and another for
> 4.4-4.6 coming, too. Bug was introduced in 4.4.
All 3 patches now applied, thanks for the backports.
greg k-h