Re: [PATCH] tty: vt: Fix soft lockup in fbcon cursor blink timer.

From: Ming Lei
Date: Wed May 18 2016 - 20:27:57 EST


On Thu, May 19, 2016 at 4:24 AM, Scot Doyle <lkml14@xxxxxxxxxxxxx> wrote:
> On Wed, 18 May 2016, Ming Lei wrote:
>> On Wed, May 18, 2016 at 4:49 AM, Pavel Machek <pavel@xxxxxx> wrote:
>> > On Tue 2016-05-17 11:41:04, David Daney wrote:
>> >> From: David Daney <david.daney@xxxxxxxxxx>
>> >>
>> >> We are getting somewhat random soft lockups with this signature:
>> >>
>> >> [ 86.992215] [<fffffc00080935e0>] el1_irq+0xa0/0x10c
>> >> [ 86.997082] [<fffffc000841822c>] cursor_timer_handler+0x30/0x54
>> >> [ 87.002991] [<fffffc000810ec44>] call_timer_fn+0x54/0x1a8
>> >> [ 87.008378] [<fffffc000810ef88>] run_timer_softirq+0x1c4/0x2bc
>> >> [ 87.014200] [<fffffc000809077c>] __do_softirq+0x114/0x344
>> >> [ 87.019590] [<fffffc00080af45c>] irq_exit+0x74/0x98
>> >> [ 87.024458] [<fffffc00080fac20>] __handle_domain_irq+0x98/0xfc
>> >> [ 87.030278] [<fffffc000809056c>] gic_handle_irq+0x94/0x190
>> >>
>> >> This is caused by the vt visual_init() function calling into
>> >> fbcon_init() with a vc_cur_blink_ms value of zero. This is a
>> >> transient condition, as it is later set to a non-zero value. But, if
>> >> the timer happens to expire while the blink rate is zero, it goes into
>> >> an endless loop, and we get soft lockup.
>> >>
>> >> The fix is to initialize vc_cur_blink_ms before calling the con_init()
>> >> function.
>> >>
>> >> Signed-off-by: David Daney <david.daney@xxxxxxxxxx>
>> >> Cc: stable@xxxxxxxxxxxxxxx
>> >
>> > Acked-by: Pavel Machek <pavel@xxxxxx>
>>
>> Tested-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
>>
>> Thanks David and Pavel for making it work!
>>
>> >
>> > (And it is amazing how many problems configurable blink speed caused).
>> >
>> > Thanks!
>> > Pavel
>> >
>
>
> Dann, Ming and David, thank you so much for all of your effort.
>
> There were three other reports in the past year, each leading to their own
> patch, of boot lockups occuring when the cursor flash timer was set using
> an ops->cur_blink_jiffies value of 0. I plan to propose a patch within
> the next day that will prevent this for all code paths.

Given this issue caues system unusable, I suggest to merge David's
oneline patch first, then you can think and try to figure out 'perfect' solution
for addressing all this kind of reports from last year.

Does it make sense?


Thanks,
Ming