Re: [PATCH 2/3] livepatch: send a fake signal to all blocking tasks
From: Miroslav Benes
Date: Thu May 18 2017 - 14:14:48 EST
On Thu, 18 May 2017, Oleg Nesterov wrote:
> I didn't see other patches in series, not sure I understand...
There is nothing relevant to this patch, I think. I did not want to bother
you with it.
> On 05/18, Miroslav Benes wrote:
> >
> > The very safe marking is done in entry.S on syscall and
> > interrupt/exception exit paths, and in a stack checking functions of
> > livepatch. TIF_PATCH_PENDING is cleared and the next
> > recalc_sigpending() drops TIF_SIGPENDING.
>
> Confused. The task can't return from do_signal() is signal_pending() is
> true, thus it will spin forever if klp_patch_pending(current)) is true.
> "forever" means until something else clears TIF_PATCH_PENDING, of course.
>
> exit_to_usermode_loop() calls do_signal(), then klp_update_patch_state().
> So it won't be cleared here.
Ok, so maybe I misunderstand the code. I see the loop in
exit_to_usermode_loop() for processing ALLWORK_MASK. There we call
do_signal(). We go to get_signal(). The infinite loop there is relevant
for us. We call dequeue_signal(). There, if I am not mistaken
__dequeue_signal() would return 0 in our case, because there is no real
signal pending and thus nothing in the signal data structures.
recalc_sigpending() is called and TIF_SIGPENDING is preserved there (I
presume TIF_PATCH_PENDING is set). signr is zero, dequeue_signal() returns
0. Back in get_signal() the loop is broken and zero is return. Then
do_signal() may or may not restart the syscall.
If not, we get back to exit_to_usermode_loop() and TIF_PATCH_PENDING is
cleared. Yes, it is true that TIF_SIGPENDING is still set and we get to
do_signal() once more. But for the last time.
If the syscall is restarted, it may be different. I have to think about
this one. But...
> Even if you change the order, this won't help unless I missed something,
> TIF_PATCH_PENDING can be set when this task has already entered do_signal().
...I think it could be solved with this anyway. And of course it should
solve the double call to do_signal() I described above.
Damn, I fixed exactly this in SLES a year or so ago and there is a note I
did the same in proposed version for upstream. It must have fallen through
the cracks.
So, am I wrong somewhere? It could be anywhere, because it is quite
confusing.
Regards,
Miroslav