Re: [PATCH] fanotify: allow freeze on suspend when waiting for response from userspace

From: Orion Poplawski
Date: Sat Dec 29 2018 - 23:00:42 EST


On 12/29/18 3:34 PM, Orion Poplawski wrote:
On 12/29/18 3:04 PM, Orion Poplawski wrote:
On Thu 22-02-18 15:14:54, Kunal Shubham wrote:
>> On Fri 16-02-18 15:14:40, t.vivek@xxxxxxxxxxx wrote:
>> From: Vivek Trivedi <t.vivek@xxxxxxxxxxx>
>> >> If fanotify userspace response server thread is frozen first,
>> it may fail to send response from userspace to kernel space listener.
>> In this scenario, fanotify response listener will never get response
>> from userepace and fail to suspend.
>> >> Use freeze-friendly wait API to handle this issue.
>> >> Same problem was reported here:
>> https://bbs.archlinux.org/viewtopic.php?id=232270
>> >> Freezing of tasks failed after 20.005 seconds
>> (1 tasks refusing to freeze, wq_busy=0)
>> >> Backtrace:
>> [<c0582f80>] (__schedule) from [<c05835d0>] (schedule+0x4c/0xa4)
>> [<c0583584>] (schedule) from [<c01cb648>] (fanotify_handle_event+0x1c8/0x218)
>> [<c01cb480>] (fanotify_handle_event) from [<c01c8238>] (fsnotify+0x17c/0x38c)
>> [<c01c80bc>] (fsnotify) from [<c02676dc>] (security_file_open+0x88/0x8c)
>> [<c0267654>] (security_file_open) from [<c01854b0>] (do_dentry_open+0xc0/0x338)
>> [<c01853f0>] (do_dentry_open) from [<c0185a38>] (vfs_open+0x54/0x58)
>> [<c01859e4>] (vfs_open) from [<c0195480>] (do_last.isra.10+0x45c/0xcf8)
>> [<c0195024>] (do_last.isra.10) from [<c0196140>] (path_openat+0x424/0x600)
>> [<c0195d1c>] (path_openat) from [<c0197498>] (do_filp_open+0x3c/0x98)
>> [<c019745c>] (do_filp_open) from [<c0186b44>] (do_sys_open+0x120/0x1e4)
>> [<c0186a24>] (do_sys_open) from [<c0186c30>] (SyS_open+0x28/0x2c)
>> [<c0186c08>] (SyS_open) from [<c0010200>] (__sys_trace_return+0x0/0x20)
>
> Yeah, good catch.
>
>> @@ -63,7 +64,9 @@ static int fanotify_get_response(struct fsnotify_group *group,
>> >>ÂÂÂÂÂ pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>> >> -ÂÂÂ wait_event(group->fanotify_data.access_waitq, event->response);
>> +ÂÂÂ while (!event->response)
>> +ÂÂÂÂÂÂÂ wait_event_freezable(group->fanotify_data.access_waitq,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ event->response);
>
> But if the process gets a signal while waiting, we will just livelock the
> kernel in this loop as wait_event_freezable() will keep returning
> ERESTARTSYS. So you need to be a bit more clever here...

Hi Jack,
Thanks for the quick review.
To avoid livelock issue, is it fine to use below change? If agree, I will send v2 patch.

@@ -63,7 +64,11 @@ static int fanotify_get_response(struct fsnotify_group *group,

ÂÂÂÂÂÂÂ pr_debug("%s: group=%p event=%p\n", __func__, group, event);

-ÂÂÂÂÂÂ wait_event(group->fanotify_data.access_waitq, event->response);
+ÂÂÂÂÂÂ while (!event->response) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (wait_event_freezable(group->fanotify_data.access_waitq,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ event->response))
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ flush_signals(current);
+ÂÂÂÂÂÂ }

Hum, I don't think this is correct either as this way if any signal was
delivered while waiting for fanotify response, we'd just lose it while
previously it has been properly handled. So what I think needs to be done
is that we just use wait_event_freezable() and propagate non-zero return
value (-ERESTARTSYS) up to the caller to handle the signal and restart the
syscall as necessary.

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR

Is there any progress here? This has become a real pain for us while running BitDefender on EL7 laptops. I tried applying the following to the EL7 kernel:

diff -up linux-3.10.0-957.1.3.el7.x86_64/fs/notify/fanotify/fanotify.c.orig kernel-3.10.0-957.1.3.el7/linux-3.10.0-957.1.3.el7.x86_64/fs/notify/fanotify/fanotify.c
--- linux-3.10.0-957.1.3.el7.x86_64/fs/notify/fanotify/fanotify.c.orig 2018-11-15 10:07:13.000000000 -0700
+++ linux-3.10.0-957.1.3.el7.x86_64/fs/notify/fanotify/fanotify.c 2018-12-28 15:44:26.452895337 -0700
@@ -9,6 +9,7 @@
ÂÂ#include <linux/types.h>
ÂÂ#include <linux/wait.h>
ÂÂ#include <linux/audit.h>
+#include <linux/freezer.h>

ÂÂ#include "fanotify.h"

@@ -64,7 +65,12 @@ static int fanotify_get_response(struct

ÂÂÂÂÂÂÂÂ pr_debug("%s: group=%p event=%p\n", __func__, group, event);

-ÂÂÂÂÂÂ wait_event(group->fanotify_data.access_waitq, event->response);
+ÂÂÂÂÂÂ while (!event->response) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ret = wait_event_freezable(group->fanotify_data.access_waitq,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ event->response);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (ret < 0)
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return ret;
+ÂÂÂÂÂÂ }

ÂÂÂÂÂÂÂÂ /* userspace responded, convert to something usable */
ÂÂÂÂÂÂÂÂ switch (event->response & ~FAN_AUDIT) {

but I get a kernel panic shortly after logging in to the system.


I tried a slightly different patch to see if setting event->response = 0 helps and to confirm the return value of wait_event_freezable:

--- linux-3.10.0-957.1.3.el7/fs/notify/fanotify/fanotify.c 2018-11-15 10:07:13.000000000 -0700
+++ linux-3.10.0-957.1.3.el7.fanotify.x86_64/fs/notify/fanotify/fanotify.c 2018-12-29 16:05:53.451125868 -0700
@@ -9,6 +9,7 @@
#include <linux/types.h>
#include <linux/wait.h>
#include <linux/audit.h>
+#include <linux/freezer.h>

#include "fanotify.h"

@@ -64,7 +65,15 @@

pr_debug("%s: group=%p event=%p\n", __func__, group, event);

- wait_event(group->fanotify_data.access_waitq, event->response);
+ while (!event->response) {
+ ret = wait_event_freezable(group->fanotify_data.access_waitq,
+ event->response);
+ if (ret < 0) {
+ pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
+ group, event, ret);
+ goto finish;
+ }
+ }

/* userspace responded, convert to something usable */
switch (event->response & ~FAN_AUDIT) {
@@ -75,7 +84,7 @@
default:
ret = -EPERM;
}
-
+finish:
/* Check if the response should be audited */
if (event->response & FAN_AUDIT)
audit_fanotify(event->response & ~FAN_AUDIT);


and I enabled the pr_debug. This does indeed trigger the panic:


[ 4181.113781] fanotify_get_response: group=ffff9e3af9952b00 event=ffff9e3aea426c80 about to return ret=-512
[ 4181.113788] ------------[ cut here ]------------
[ 4181.113804] WARNING: CPU: 0 PID: 24290 at fs/notify/notification.c:84 fsnotify_destroy_event+0x6b/0x70

So it appears that the notify system cannot handle simply passing -ERESTARTSYS back up the stack here.

--
Orion Poplawski
Manager of NWRA Technical Systems 720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane orion@xxxxxxxx
Boulder, CO 80301 https://www.nwra.com/