Re: [PATCH 1/1] ipmi: setting mod_timer for read_event_msg buffercmd

From: Srinivas_G_Gowda
Date: Wed Dec 04 2013 - 06:59:30 EST


On 12/03/2013 10:40 PM, Corey Minyard wrote:

> On 12/03/2013 12:18 AM, Srinivas_G_Gowda@xxxxxxxx wrote:
>> Dell - Internal Use - Confidential
>>
>> Hi Corey,
>>> Unfortunately, that would start the timer unnecessarily. You don't want to start timers unnecessarily in the kernel or the power management police will come after you.
>> I still see the issue after applying the patch.
>> I have one question, for commands such as IPMI_READ_EVENT_MSG_BUFFER_CMD in smi_event_handler() that are invoked by the driver itself where do you expect the timer to be set ??
>
> The timer should be done by the calling function. Since the timer is
> actually used to time out complete operations, you don't want to reset
> it from interrupts or the kthread. The patch I added would start the
> timer if it wasn't running and the event returned something non-idle.
>
> So what should happen is smi_event_handler() starts the event message
> buffer read, goes back to the top of smi_event_handler() and handles the
> new message. It should return something besides idle in that case. And
> thus the timer should be started.
>
>>> The patch I sent did have this call in the non-idle portion of the kernel thread and that should have done the same thing. Plus, if you are using the kernel thread, it's going to run periodically and should kick things off again if they get stuck. I'm suspicious now that something else is going on.
>> ipmi_thread() is getting invoked when the issue is seen, but I keep hitting this condition
>> else if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait)
>> Since the timer is not armed for IPMI_READ_EVENT_MSG_BUFFER_CMD, smi_event_handler calls with time value 0, which doesn't help when OBF is stuck.
>> We keep running around this loop since we never hit the kcs error states.
>
> Maybe the timer start function needs to be called from that else clause,
> but it didn't seem so to me. It wouldn't hurt, I suppose. But the
> event handler should eventually return something non-idle and the timer
> start function should get called. Is the ipmi thread just stuck using
> 100% CPU?

Yes

> Does the final else clause ever get hit? Maybe the bug is in
> ipmi_thread_busy_wait().
>

No, I did not see final else in ipmi_thread ever getting hit.

I am looking at ipmi_thread_busy_wait(),
by default the below condition never gets set unitl I explicetly set "kipmid_max_busy_us" to some value.
if (smi_info->intf_num < num_max_busy_us)

This means by default "max_busy_us" is always 0. Hence Ill basically end up only doing this
if (max_busy_us == 0 || smi_result != SI_SM_CALL_WITH_DELAY)
ipmi_si_set_not_busy(busy_until);

That probably explains why I never hit the else condition in ipmi_thread.

I see ipmi_start_timer_if_necessary() getting called from ipmi_thread() after setting "kipmid_max_busy_us"
Ill set "kipmid_max_busy_us=300" and run the stress. I am hoping that we will not see the issue in this case.


Thanks,
G



> Thanks,
>
> -corey
>
>>
>>
>> Thanks,
>> G
>>
>> -----Original Message-----
>> From: Corey Minyard [mailto:tcminyard@xxxxxxxxx] On Behalf Of Corey Minyard
>> Sent: Tuesday, December 03, 2013 2:34 AM
>> To: Gowda, Srinivas G
>> Cc: tcminyard@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; openipmi@xxxxxxxxxx
>> Subject: Re: [PATCH 1/1] ipmi: setting mod_timer for read_event_msg buffer cmd On 12/02/2013 08:49 AM, Srinivas_G_Gowda@xxxxxxxx wrote:
>>> Thanks for the patch Corey.
>>> I am afraid that the system does not have interrupts enabled. It uses polling mode.
>>>
>>> When the error is seen, I know for a fact that in function
>>> ipmi_thread() smi_result is SI_SM_CALL_WITH_DELAY, I have some logs where in busy_wait always reads as 1. Not sure if it was ever set to 0. (ill check this again).
>>> Ill anyway run the test using the patch that you have shared.
>>>
>>> b/w would it harm if we were to do to something like this ?
>> Unfortunately, that would start the timer unnecessarily. You don't want to start timers unnecessarily in the kernel or the power management police will come after you.
>> The patch I sent did have this call in the non-idle portion of the kernel thread and that should have done the same thing. Plus, if you are using the kernel thread, it's going to run periodically and should kick things off again if they get stuck. I'm suspicious now that something else is going on.
>> -corey
>>> Signed-off-by: Srinivas Gowda <srinivas_g_gowda@xxxxxxxx>
>>> ---
>>> drivers/char/ipmi/ipmi_si_intf.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/char/ipmi/ipmi_si_intf.c
>>> b/drivers/char/ipmi/ipmi_si_intf.c
>>> index 15e4a60..e23484f 100644
>>> --- a/drivers/char/ipmi/ipmi_si_intf.c
>>> +++ b/drivers/char/ipmi/ipmi_si_intf.c
>>> @@ -1008,6 +1008,7 @@ static int ipmi_thread(void *data)
>>> spin_unlock_irqrestore(&(smi_info->si_lock), flags);
>>> busy_wait = ipmi_thread_busy_wait(smi_result, smi_info,
>>> &busy_until);
>>> + ipmi_start_timer_if_necessary(smi_info);
>>> if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
>>> ; /* do nothing */
>>> else if (smi_result == SI_SM_CALL_WITH_DELAY &&
>>> busy_wait)
>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/