Re: [GIT PULL] General notification queue and key notifications

From: Christian Brauner
Date: Wed Jun 10 2020 - 05:56:19 EST


On Tue, Jun 02, 2020 at 04:55:04PM +0100, David Howells wrote:
> Date: Tue, 02 Jun 2020 16:51:44 +0100
>
> Hi Linus,
>
> Can you pull this, please? It adds a general notification queue concept
> and adds an event source for keys/keyrings, such as linking and unlinking
> keys and changing their attributes.
>
> Thanks to Debarshi Ray, we do have a pull request to use this to fix a
> problem with gnome-online-accounts - as mentioned last time:
>
> https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47
>
> Without this, g-o-a has to constantly poll a keyring-based kerberos cache
> to find out if kinit has changed anything.
>
> [[ With regard to the mount/sb notifications and fsinfo(), Karel Zak and

The mount/sb notification and fsinfo() stuff is something we'd like to
use. (And then later extend to allow for supervised mounts where a
container manager can supervise the mounts of an unprivileged
container.)
I'm not sure if the mount notifications are already part of this pr.

Christian

> Ian Kent have been working on making libmount use them, preparatory to
> working on systemd:
>
> https://github.com/karelzak/util-linux/commits/topic/fsinfo
> https://github.com/raven-au/util-linux/commits/topic/fsinfo.public
>
> Development has stalled briefly due to other commitments, so I'm not
> sure I can ask you to pull those parts of the series for now. Christian
> Brauner would like to use them in lxc, but hasn't started.
> ]]
>
>
> LSM hooks are included:
>
> (1) A set of hooks are provided that allow an LSM to rule on whether or
> not a watch may be set. Each of these hooks takes a different
> "watched object" parameter, so they're not really shareable. The LSM
> should use current's credentials. [Wanted by SELinux & Smack]
>
> (2) A hook is provided to allow an LSM to rule on whether or not a
> particular message may be posted to a particular queue. This is given
> the credentials from the event generator (which may be the system) and
> the watch setter. [Wanted by Smack]
>
> I've provided SELinux and Smack with implementations of some of these hooks.
>
>
> WHY
> ===
>
> Key/keyring notifications are desirable because if you have your kerberos
> tickets in a file/directory, your Gnome desktop will monitor that using
> something like fanotify and tell you if your credentials cache changes.
>
> However, we also have the ability to cache your kerberos tickets in the
> session, user or persistent keyring so that it isn't left around on disk
> across a reboot or logout. Keyrings, however, cannot currently be
> monitored asynchronously, so the desktop has to poll for it - not so good
> on a laptop. This facility will allow the desktop to avoid the need to
> poll.
>
>
> DESIGN DECISIONS
> ================
>
> (1) The notification queue is built on top of a standard pipe. Messages
> are effectively spliced in. The pipe is opened with a special flag:
>
> pipe2(fds, O_NOTIFICATION_PIPE);
>
> The special flag has the same value as O_EXCL (which doesn't seem like
> it will ever be applicable in this context)[?]. It is given up front
> to make it a lot easier to prohibit splice and co. from accessing the
> pipe.
>
> [?] Should this be done some other way? I'd rather not use up a new
> O_* flag if I can avoid it - should I add a pipe3() system call
> instead?
>
> The pipe is then configured::
>
> ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
> ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
>
> Messages are then read out of the pipe using read().
>
> (2) It should be possible to allow write() to insert data into the
> notification pipes too, but this is currently disabled as the kernel
> has to be able to insert messages into the pipe *without* holding
> pipe->mutex and the code to make this work needs careful auditing.
>
> (3) sendfile(), splice() and vmsplice() are disabled on notification pipes
> because of the pipe->mutex issue and also because they sometimes want
> to revert what they just did - but one or more notification messages
> might've been interleaved in the ring.
>
> (4) The kernel inserts messages with the wait queue spinlock held. This
> means that pipe_read() and pipe_write() have to take the spinlock to
> update the queue pointers.
>
> (5) Records in the buffer are binary, typed and have a length so that they
> can be of varying size.
>
> This allows multiple heterogeneous sources to share a common buffer;
> there are 16 million types available, of which I've used just a few,
> so there is scope for others to be used. Tags may be specified when a
> watchpoint is created to help distinguish the sources.
>
> (6) Records are filterable as types have up to 256 subtypes that can be
> individually filtered. Other filtration is also available.
>
> (7) Notification pipes don't interfere with each other; each may be bound
> to a different set of watches. Any particular notification will be
> copied to all the queues that are currently watching for it - and only
> those that are watching for it.
>
> (8) When recording a notification, the kernel will not sleep, but will
> rather mark a queue as having lost a message if there's insufficient
> space. read() will fabricate a loss notification message at an
> appropriate point later.
>
> (9) The notification pipe is created and then watchpoints are attached to
> it, using one of:
>
> keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
> watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
> watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);
>
> where in both cases, fd indicates the queue and the number after is a
> tag between 0 and 255.
>
> (10) Watches are removed if either the notification pipe is destroyed or
> the watched object is destroyed. In the latter case, a message will
> be generated indicating the enforced watch removal.
>
>
> Things I want to avoid:
>
> (1) Introducing features that make the core VFS dependent on the network
> stack or networking namespaces (ie. usage of netlink).
>
> (2) Dumping all this stuff into dmesg and having a daemon that sits there
> parsing the output and distributing it as this then puts the
> responsibility for security into userspace and makes handling
> namespaces tricky. Further, dmesg might not exist or might be
> inaccessible inside a container.
>
> (3) Letting users see events they shouldn't be able to see.
>
>
> TESTING AND MANPAGES
> ====================
>
> (*) The keyutils tree has a pipe-watch branch that has keyctl commands for
> making use of notifications. Proposed manual pages can also be found
> on this branch, though a couple of them really need to go to the main
> manpages repository instead.
>
> If the kernel supports the watching of keys, then running "make test"
> on that branch will cause the testing infrastructure to spawn a
> monitoring process on the side that monitors a notifications pipe for
> all the key/keyring changes induced by the tests and they'll all be
> checked off to make sure they happened.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch
>
> (*) A test program is provided (samples/watch_queue/watch_test) that can
> be used to monitor for keyrings, mount and superblock events.
> Information on the notifications is simply logged to stdout.
>
> Thanks,
> David
> ---
> The following changes since commit b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce:
>
> Linux 5.7-rc6 (2020-05-17 16:48:37 -0700)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-20200601
>
> for you to fetch changes up to a8478a602913dc89a7cd2060e613edecd07e1dbd:
>
> smack: Implement the watch_key and post_notification hooks (2020-05-19 15:47:38 +0100)
>
> ----------------------------------------------------------------
> Notifications over pipes + Keyring notifications
>
> ----------------------------------------------------------------
> David Howells (12):
> uapi: General notification queue definitions
> security: Add a hook for the point of notification insertion
> pipe: Add O_NOTIFICATION_PIPE
> pipe: Add general notification queue support
> security: Add hooks to rule on setting a watch
> watch_queue: Add a key/keyring notification facility
> Add sample notification program
> pipe: Allow buffers to be marked read-whole-or-error for notifications
> pipe: Add notification lossage handling
> keys: Make the KEY_NEED_* perms an enum rather than a mask
> selinux: Implement the watch_key security hook
> smack: Implement the watch_key and post_notification hooks
>
> Documentation/security/keys/core.rst | 57 ++
> Documentation/userspace-api/ioctl/ioctl-number.rst | 1 +
> Documentation/watch_queue.rst | 339 +++++++++++
> fs/pipe.c | 242 +++++---
> fs/splice.c | 12 +-
> include/linux/key.h | 33 +-
> include/linux/lsm_audit.h | 1 +
> include/linux/lsm_hook_defs.h | 9 +
> include/linux/lsm_hooks.h | 14 +
> include/linux/pipe_fs_i.h | 27 +-
> include/linux/security.h | 30 +-
> include/linux/watch_queue.h | 127 ++++
> include/uapi/linux/keyctl.h | 2 +
> include/uapi/linux/watch_queue.h | 104 ++++
> init/Kconfig | 12 +
> kernel/Makefile | 1 +
> kernel/watch_queue.c | 659 +++++++++++++++++++++
> samples/Kconfig | 6 +
> samples/Makefile | 1 +
> samples/watch_queue/Makefile | 7 +
> samples/watch_queue/watch_test.c | 186 ++++++
> security/keys/Kconfig | 9 +
> security/keys/compat.c | 3 +
> security/keys/gc.c | 5 +
> security/keys/internal.h | 38 +-
> security/keys/key.c | 38 +-
> security/keys/keyctl.c | 115 +++-
> security/keys/keyring.c | 20 +-
> security/keys/permission.c | 31 +-
> security/keys/process_keys.c | 46 +-
> security/keys/request_key.c | 4 +-
> security/security.c | 22 +-
> security/selinux/hooks.c | 51 +-
> security/smack/smack_lsm.c | 112 +++-
> 34 files changed, 2185 insertions(+), 179 deletions(-)
> create mode 100644 Documentation/watch_queue.rst
> create mode 100644 include/linux/watch_queue.h
> create mode 100644 include/uapi/linux/watch_queue.h
> create mode 100644 kernel/watch_queue.c
> create mode 100644 samples/watch_queue/Makefile
> create mode 100644 samples/watch_queue/watch_test.c
>